DOCOHEHT BESDHE 

80 TH 003 674 

Tronsgard, David T. ; And Others 
Statewide Educational Evaluation. 

National Association of State Boards of Education, 
Denver r Colo. 

Bureau of Elementary and Secondary Education 
(DHEi/OE) , Washington, D.C. Div. of State Agency 
Cooperation. 
7U 

78p. 

National Association of State Boards of Education, 
2480 H. 26th Avenue, Suite 215-B, Denver, Colorado 
80211 ($3.00) 

MF-$0.75 HC*$4.20 FLOS POSTAGE 

Academic Achievement; ^Educational Assessment; 

Evaluation Criteria; ^Evaluation Methods; Information 

Dissemination ; Learning; ^State Boards of Education; 

♦State Programs; Statistics; Testing; ♦Testing 

Programs 

Elementary Secondary Education Act Title V; ESEA 
Title V 



This book has been written for state board of 
education members and other citizens interested in public education. 
It is, in a sense, a primer in matters relating to learning, testing, 
assessment, and evaluation. Presented are some philosophical and 
political considerations in statewide educational evaluation. 
Learning is defined and the types and levels of learning are 
discussed. The remaining sections are devoted to: the measurement and 
evaluation of studnet learning; the problem of appropriate 
educational criteria; some suggestions for reporting the outcomes of 
evaluating student learning; and some rules of thumb w^iiQh state 
school board members might employ to assist them in the evaluation of 
student learning resulting from curricula under their purviev. The 
appendixes contain a directory of key state educational evaluation 
personnel and contracting agencies used by state education 
departments for matters relating to assessment. (Author/BC) 



ED 095 202 

AUTHOB 
TITLE 

INSTITUTION 
SPONS AGENCY 



POB DATE 
NOTE 

AVAILABLE FBOM 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



ERIC 



STATEWIDE 

EDUCATIONAL 
EVALUATION 



us DEPARTMENTOP HEALTH. David T. Ttoiisgard 

EDUCATION* WELFARE i i ■ J f 

NATIONAL ihfSTiTUTE OF Micliacl J. Uraav, Jr. 

EDUCATION 

V^%^Z^^^^^^ BEEN REPRo E. Dean Coon 

OUCED EXACTLY AS RECEIVED FROM fc-'^-U" 

The person or OftCflNlZATlON DRiGiN 

ATINGIT POINTS OF VIEW OR OPINIONS 

STATED 00 NOT NECESSARILY HEPRE 

SENTOFPICIAL NATIONAL INSTITUTEOF 

EDUCATION POSITION OR POLICY 



The National Association of State Boards of Education 
Denver, Colorado 
1974 



Financed by Funds provided by the Elementary and Secondary 
Education Act of 1965 (Public Uw 89-10, Title V, Section 505) and 
greatly assisted by the sponsoring and cooperating states: 

New York 

Alabama 

Nevada 

New Jersey 

Kansas 

Colorado 



Copies of tl\is book may be obtained from: 
National Association of State Boards of Education, 
2480 West 26th Avenue - Suite 21 5-B 
Denver, Colorado 80211 



ERJC 



/ 



FOREWORD 



This little book has been written for state board of education mem- 
bers and other citizens interested in public education. It is, in a sense, 
a primer in matters relating to learning, testing, assessment and evalu- 
ation. One unusual feature of the book is that it begins with a glossary. 
Educators are famous for their argot. Often they use complicated 
words for simple ideas. In the case of words relating to evaluation, 
the technical words are usually more justified and the meanings pre- 
cise. For this reason, the authors of the book believe that laymen will 
want to learn the most common meanings for the jargon relating to 
learning and measurement. These words are used in the text and 
hopefully as defined. 

Reading the book should prove as helpful exercise prior to im- 
portant discussions of statewide assessment. The issues delineated 
should at least be considered before decisions are made. After alU 
very important educational and social consequences are at stake. 

David T. Tronsgard 
Executive Secretary 
NASbE 



ERIC 



ABOUT THE AUTHORS 



David T. Tronsgard has served Tor tour years as Executive Secretary 
ol'NASBE. Prior to that he was a college professor and administrator, 
and a superintendent of schools. He has an A.B. degree from San 
Diego State College, an M.A. degree from Teachers College, Columbia 
University and an Ed.D. degree from Stanford University. 

Michael J, Grady, Jr., is presently an independent educational con- 
sultant. A retired Air Force Lieutenant Colonel, Mr. Grady has pre- 
viously been a director of research and an employee of the Colorado 
Department of Education. He received an Ed.B. degree from Rhode 
Island College, and M.A. and Ph.D. degrees from the University of Ala- 
bama, the latter in educational and counseling psychology. 

E, Dean Coon is currently an associate professor and assistant direc- 
tor for the Center tor Northern Educational Research, University of 
Alaska. Previously he was Associate Commissioner of Education in 
Colorado. He received his doctorate from the University of Denver. His 
M.A. degree is from University of Nebraska and his A.B. degree from 
Colorado State College in Greeley. 



ERLC 



TABLE OF CONTENTS 

Page 

Glossary v 

Chapter 1: Statewide Edueational Evaluation 

Some Philosophical Considerations 1 

Chapter U: Statewide Educational Evaluation 

Some Political Considerations 8 

Chapter 111: Learning 20 

Chapter IV: Measurement 28 

Chapter V: Evaluation 38 

Chapter VI: Statewide Educational Evaluation 

Some Technical Considerations 47 

Chapter VIl: Some Final Considerations 57 

References 60 

Appendix 1: Director;^ of Key SEE Assessment and 

Evaluation Personnel 62 

Appendix li: Contracting Agencies Used by State Education 

Departments for Matters Relating to Assessment , 66 



ERLC 



Accountability 

Affective learning 

Arithmetic mean 
Cognitive learning 

Comprehensiveness 

Criterion-referenced 
test 

Differentiation 

Educational 
deprivation 

Effect (learning 
law of effect) 



GLOSSARY 

- Tlie process of being accountable. In educa- 
tion, accountability refers to the proper 
stewardship of educational funds, resources, 
and the generation of appropriate service 
for learners. It may be defined narrowly, as 
by legislative mandate for public disclosure, 
or broadly, as by parental expectations. 

- Learning which describes changes in inter- 
est, attitudes, and values, and the develop- 
ment of appreciations and adequate adjust- 
ment. Citizenship skills, for instance, are 
affective. 

- Average of all the scores in the distribution; 
the scores added and divided by the total 
number of scores. 

- Leammg which deals v/ith the recall and 
recognition of knowledge and the develop- 
ment of intellectual abilities and skills; aca- 
demic learning. 

- A test is comprehensive if the test items 
adequately sample the full range of objec- 
tives and subject matter in a curriculum; 
the degree to which a testis narrow or broad. 

- A test designed to measure the achievement 
or mastery of school objectives; a test re- 
flecting pre-determincd values, judgment, or 
professional standards and scored a priori 
without regard to norms. 

- In testing, when test items discriminate be- 
tween bright students who get the items 
correct and slow students who get the items 
wrong. 

- Educational practices or cultural exigencies 
which do not allow students to learn in ac- 
cordance with their abilities. 

- Man learns what is pleasurable or is satisfying 
to learn. 



ERIC 



v. 



Evaluation 
Explication 

Frequency (learning 
law of frequency) 

Goal 

Hypothesis 

Imply 
Infer 

Learning 



Learning style 



Median 



Mode 

Multiple reinforcement 
Norm-referenced tests - 



Judgment based upon criteria; opinions and 
values applied to assessment. 

A process of systematically describing, or the 
process of dividing a topic into semantic or 
logical components. 

Tilings learned most often are best remem- 
bered. 

Broad statement of educational purpose. 

A tentative statement of an educational 
outcome which a researcher wants to test; 
a theory to be subjected to investigation. 

Communicating a conclusion to someone 
else, suggesting an interpretation. 

Drawing conclusions from data, usually for 
oneself; coming to a conclusion. One does 
not infer to or for someone else. 

A change in behavior modified by experi- 
ence. Scholars doubt learning not accom- 
panied by a change in behavior. To say 
learning is the mastery of material gives 
many semantic problems. 

Specific approach which each learner finds 
most advantageous to his learning (Visual, 
auditory, or combination). 

The score which is half-way Jn the distribu- 
tion so that 50% of the scores fall above 
and below. The median is usually very 
stable and is not affected by a few extreme 
scores. 

The most frequent score that occurred in a 
given distribution. In a bimodal distribu- 
tion, a curve will have two modes (peaks). 

Systematic repeating of what is to be learned 
and the rev/arding of correct responses. 

Tests which are standardized against the 
performance of a national sample. Tests 
which are reported in standard scores based 
upon the normal curve. 



vi. 



Objective 

(teaming objective) 



A limited amount of student learning to 
partially achieve an educational goal; a de- 
sired end of a lesson or course. 



Objectivity 

Primacy (learning 
law of primacy) 

Psycho-motor 

Recency (law of 
recency) 

Reliability 

Serendipity 
Sociogram 

Statewide assessment 

Taxonomy 
Validity 



A test has objectivity if the same score is 
assigned no matter who corrects the test. 
Essay tests are low in objectivity; multiple- 
choice tests high. 

Things learned first are best remembered. 

Learning which deals with manipulative or 
motor-skills. 

Things learned last are best remembered. 

Consistency of test results from sequential 
administrations; the degree to which a test 
will give the same results when administered 
again. 

The process of discovering something im- 
portant or beneficial by sheer accident, usu- 
ally when investigating something else. 

A technique which measures likes and dis- 
likes of membeis of a group in regard to 
other members of a group. A diagram of 
group preferences within a group. 

Gathering data about educational learning 
on a statewide basis. Measuring the status of 
educational achievement. 

An organized listing, the end result of an 
explication; the classification of phenomena. 

The degree to which a test measures what it 
was intended to measure and nothing else. 
Proof the test gives the desired information. 



ERIC 



viL 



CHAPTER I 



STATEWIDE EDUCATIONAL EVALUATION 
Some Philosophical Considerations 

It is fashionable for citizens and their representatives to say they 
want to measure what the schools are doing, to ascertain what the 
students are learning, and to see if educational monies are being well 
spent. What is more, people are beginning to want to compare schools, 
school districts, counties and even states. Because these facts are so, 
projects for statewide assessment have begun in a large number of 
states. National studies and commissions have been organized, and 
several private corporations have re- /oo/ec/ to aid in these endeavors. 

Certainly, it is true, our society lacks a systematic method for ap- 
praising the quality of its schools. We have at our disposal many means 
for assessing student performance, especially by the norm-reference 
method. We can muster impressive data about individuals. However, ' 
little of this is systematic and neither can this information tell us 
much about the quality of schools and school districts. We need, it 
would seem, better ways of looking at what we are doing, perhaps to 
include comparisons with other societies. 

Testing and publishing companies, even those which are chartered 
as non-profit organizations, certainly can be expected to react to new 
demands. They must compete for a share of educational funds so vital 
to their continuance. This greatly explains their willingness to coop- 
erate in statewide assessment projects. Some politicians and elected 
representatives of the people have been sensitive to popular demands 
for evaluation and assessment. Services deemed necessary by the 
majority (or the credibly noisy) will eventually be implemented. 
Public pressure explains much willingness to participate in and to al- 
locate resources for assessment. Explanations of the motives behind 
the rather general clamor among the public for assessment are harder 
and require more inspection, more introspection, and less cynicism. 

Perhaps it could be argued, though not convincingly, the press for 
assessment was sold to the people by their leaders and by the profes- 
sional testors through the mass media, and now these same persons 
are merely responding to the induced demands. In any case, the de- 
mand is present and must be served. How to serve it wisely is the 
problem. This handbook is meant to help state board of education 
members consider the philosophical, political and technical ramifica- 
tions of the measurement of school achievement. It is also intended 
to be a resource book for all citizens who want to study both the re- 
wards and dangers inherent in statewide assessment of educational 
efforts. 



If we totally assess the full spectrum of educational accomplishment 
and know how all our students are performing in all their subjects, 
and know likewise how our schools stand in relation to each other, 
and how our school districts are doing, what will v/e have proved? 
Before we start such a major task, we must certainly realize that no 
matter how good we are, one half of all our pupils will score average 
or below in relation to the total pupil population; one half of our 
schools and school districts will do likewise. Also we will find that 
middle and upper-class students will in general get more t/ian their 
share of the better marks. Why then should state board of education 
members be asked to support a movement which will discover what we 
in general already know? 

Starting with a brain-set that assessment is best accomplished by 
norm-referenced measurement is probably a lack of open-mindedness. 
Good statewide assessment may make only limited use of norms and 
probably will focus instead on criteria. These criteria may be logical, 
predetermined, and arbitrary. Practical accomplishment of goals is 
what we are after and assessment can tell us if we are succeeding. 

If v/e can generate an educational structure which both will allow 
the vast majority to become self-reliant and productive adults and 
will have flexibility to serve individuals throughout tlieir lives, dif- 
ferential performance may prove useful rather than discouraging. 
Bearing in mind that statewide assessment ip analysis of our system, 
as much as it is measurement of individuals, we should not be led astray 
by discrepancies in individual potentialities. To justify assessment we 
need reasons for it and to paraphrase a previous point, state board 
members should be able to recite at least a few. 

The practical comes to mind first. State boards must make recom- 
mendations to legislatures. They often determine minimum curricular 
goals. They must allocate resources, at least those of state departments 
of education. Surely they help set state priorities for education and 
participate in planning. Assessment can provide the raw material for 
these kinds of choices as well as the test of them. This is a good rea- 
son for assessment. Others are more subtie but equally valid. 

Some of us are curious, a large number of us. The reasons are many 
and a few of them should be explored here. We want to measure 
school performance because we do believe it to be measureable and 
we really want to see what people know, can know, aud ought to 
know. Like pure scientists, some of us adopt the optimistic hypothe- 
sis that we will discover good uses for the information if we only had 
it, just as many uses for the population census have been discovered 
beyond the imagination of our founding fathers. In the past we have 
discovered that test results are useful for individual counseling, are 



useful for discovering correctable weaknesses, and can aid in curricu- 
lum construction. The goals of past testing programs were primarily 
directed at individual uses: althougl) school administrators fortunate 
enough to head districts of high scoring pupils may have used test re- 
sults for what could be cMcd pr<}/)agai}clu purj^oses. Faith in serendip- 
ity may not be a good argument to use in advocating statewide assess- 
ment to taxpayers, especially considering the inevitable high costs. 
But honesty requires this admission by many. 

Another reason some of us want statewide assessment is in order 
to make comparisons possible. We can see if districts which spend 
moi*e money do better. We can see which of our school citizens have 
high achievement and where we are getting the most for our money. 
At the same time we can find our weaknesses so that we can admit 
them, and hopefully correct them. We are not sure of the full conse- 
quences of comparisons; neither have we deduced a complete explica- 
tion from among the infmity of possible comparisons. Honesty again 
requires us to admit that we know all school districts are spending to 
or near the limit of their incomes and can always make use of more 
funds. In other words, rich districts almost always spend more money 
per pupil. We also know that our big cities will not do as well aca- 
demically as their suburbs regardless of educational expenditures. 

Accountability is a fourth reason for statewide assessment, one 
which has much popular appeal and derives from the ability to com- 
pare results. Some of us believe that if we can make comparisons, 
we can hold schools and school districts accountable. We can say they 
have or have not been faithful stewards of our children and our re- 
sources. Much of the steam behind demands for statewide assessment 
obviously stems from the hope that accountability is both feasible 
and possible. Prevalent, of course, iimong faxpaycr.s is the belief that 
many educational professionals are voracious rascals creative only in 
their collective ability to squander public monies. Accountability is 
seen as one way to curb these appetites. Some demands for accounta- 
bility are based upon the a.ssumption that assessment can prove how 
good individual teachers are. The consideration, to use a most apt 
cliche, is a real can of wonns. It nevertheless exists as a motive for 
assessment. 

A fifth push for statewide assessment comes from advocates of the 
view that education is partly responsible for the amelioration of dif- 
ferences among children due to economic, intellectual, or physical 
reasons. Because we espouse compassionate ideals of equity, many of 
us want to treat the unfortunate unequally to give them a favorable 
chance to compete in our highly competitive society. Inherent in 
these considerations are many philosophical contradictions (regardless 
of one's point of view), but the argument to augment educational op- 

-> 

- j> - 

ERIC 



portunitics lor the poor, llic handicapped, and the educationally de- 
prived persists. Educational deprivation can be demonstrated by 
testing, measurement and assessment. It proves what would otherwise 
bean a.ssumption. It follows then that areas ofpoor educational perfor- 
mance can be attacked and reduced. In an educational system based 
upon arbitrary standards, this Is a valid argument. Measurement based 
upon the relativity of the normal curve of distribution will show the 
same number oUvinncrs and losers. However, the mean for any curve 
can be raised. A pessimist would .say that wanting to reduce Inequali- 
ties is easier than the finding of solutions to it. Certain recent publica- 
tion's add to pessimism (especially Jeneks, Incqiutlity) but most of us 
prefer optimism and are sure we can make things better. 

The last reason given here for statewide assessment is that it will aid 
us in decision making. Decision making stems from and relates to the 
practical. We will know where we are strong and where we an) weak. 
We will also know who is strong and who is weak in an academic sense. 
Armed with this information, we are in a better position to make in- 
telligent decisions. State board of education members make many de- 
cisions, most of them on the basis of personal opinions and values, 
and some of them because of public pressure. With the results of state- 
wide assessment in their hands, state board members can make deci- 
sions more scientifically and more correctly. They can also answer the 
criticisms tliat arise about decisions more precisely. 

To summarize the six reasons given here for statewide assessment 
they are as follows: 

1. practicality 

2. curiosity 

3. comparison 

4. accountability 

5. amelioration 

6. decision making 

Certain articles of faith accompany these motives to assess. The 
first is the universally held assumption that the way things are done 
in the schools will make a difference in the academic performance of 
individuals. If we didn't believe this, all our efforts would seem futile. 
, Much of the research on curricula, probably most of it, hasn't been 
encouraging. We have made only a few breakthroughs in procedures 
which dramatically change how children learn but we must always 
assume more are possible. What alternatives do we have to an assump- 
tion that changes in allocations, particularly financial allocations, will 
change achievement patterns? Recerttly critics have marshalled, we 
must admit, impressive evidence apparently debunking the theory 
that increased educational expenditures improve educational perform- 
ance. These critics have at least shown that unexamined acceptance 

-4- 

ERLC 



of this article ortaith is at least naive and possibly false. Thirdly, as- 
sessing educational accomplishments assumes that we know what 
causes learning: if we know how students do, we also know why they 
do thusly. This means we arc willing to grade the quality of our in- 
puts while we are grading the efforts of the students, A fourth article 
of faith is based on the belief that we can cause and control change; 
equality is possible; amelioration is feasible; we can do sometliing 
about areas of weakness. If we do not believe at least one of the fore- 
going articles of faith, then we should not waste much time on assess- 
ment. We should merely follow traditions, public wishes, and profes- 
sional recommendations and worry less. 

Before we commence any widespread efforts to assess what the 
'schools are doing, we have an obligation to know what we are doing 
in education and why. As decision makers in the educational process, 
state board members should be prepared with answers with what they 
believe desirable in education. Their values and pliilosophical ideas 
should be rational and well-defined. Examples of questions educa- 
tional decision makers should be able to answer are contained in the 
following taxonomy: 

Questions for Educational Decision Makers 

What should our school graduates 

1. know? 

2. be able to do? 

3. believe? 

To what extent should the schools 

4. ameliorate social class differences? 

5. emphasize individual excellence? 

6. insist on common minimum standards? 

7. grant diplomas and promotion on standards of performance? 
S. treat the exceptional student exceptionally? 

What do our citizens believe. 
9, should be the goals of education? 

10, are the essential priorities of education? 

11, about the value of student preferences and needs? 

What should we do about the contradictions affecting education, 

12, between freedom, progress, and necessity ? 

13, between goals of equality avid goals of excellence? 

14, between what we believe and what we are doing? 

15, between societal, and individual needs and preferences? 

These are hard questions but when we agi'*ee to be educational de- 
cision makers, we imply to others that we have answers to these 
questions or are at least willing to gain them. Deciding what the 
schools will be like and what they will teach is probably also deciding 
partly how all other institutions will be. It is morally wrong to take 
on such a responsibility without the honesty to also attempt to be a 

-5- 

ERIC 



philosopher on nuitters relating to education. It is true, then, to 
restate, that major educational decisions are philosophical decisions. 
The test of decisions is utility, which may be demonstrated by as- 
sessment, but the soul of decisions is wisdom and insight. These come 
from study, debate, and contemplation. 

The data collected from statewide assessment has no meaning, as 
does any data, without interpretation. Interprciation requires Irames 
of reference. Answers to the questions given as exiunples in the 
taxonomy of questions can be the reference points. As you noticed 
the questions were divided into four parts. The first asked about 
students and their perfomiance. Obviously we have to have something 
in mind for students before testing makes sense. After we have de- 
cided what is good, we can see if our ideals are being approached. 

The second group asked questions about social policy. These are 
questions drawn from the classical arguments relating to treatment of 
students. In American society we have responded to our educational 
needs by providing mass education and have somewhat obscured goals 
for individual performance. We appreciate excellence but we have 
been more concerned with typical accomplishment. Many of us are 
now beginning to question our system for this reason. Education by 
behavioral objectives and diplomas for performance are recent ex- 
amples of proposals to return to a more European educational system, 
and one less egalitarian, therefore. If we use assessment to usher in 
arbitrary standards, this is not necessarily wrong but we should do it 
knowingly, anticipating the fallout benefits and detriments. 

The third group of questions requires decision makers to know 
their constituencies. Because we are a democratic republic our de- 
cision makers have an obligation to take into consideration vox popu- 
lis. They should not be blind-sided by their own biases, both a danger- 
ous and arrogant attitude. 

The fourth group of questions are gut-level philosophical issues. 
These questions show the many paradoxes hiding in what we believe 
and what we must decide. We do, in fact, believe and act upon many 
conflicting goals. We make many compromises with ourselves as well 
as with others. The best stance to assume in the face of contradiction 
is humility and willingness to study, to listen, and to learn. 

In summary, this chapter is meant to give state board of education 
members a primer in educational philosophy, especially in those as- 
pects relating to problems of measurement of progress toward educa- 
tion expectations. Statewide assessment of education is very costly 
both in money and time. To justify these expenditures, it seems im- 
perative we know what we are doing and why. Besides just telling us 
how well we stack up, statewide assessment may alter many things 
including goals and public attitudes. Many will assume assessment to 
be the end of education itself. We will not want this to happen but 
it will. Others will use assessment as a club, parsimonious legislators 
for instance. Others will fear it, perhaps teachers who stand to be 



ERLC 



- 6 - 



judged by student perrormunce. We don't want to release a box full of 
evil spirits and pestilence into the world as did the mythical Pandora. 
We want to make intelligent and rational use of good educational 
tools and do so with intellectual honesty and fierce personal com- 
mitment. This is the spirit in which this book is given to you. The aim 
is to be utilitarian. 



•7- 



CHAPTER 11 



STATEWIDE EDUCATIONAL EVALUATION 
Some Political Considerations 

Ifaftcr being inlroduccd to slalcwidc cducalional assessment, your 
comment is That must be wonderful; ! don't understand it at alL then 
you are in good company. Moliere's 17th Century remark is very 
appropriate today. It must express the feelings of many as they are 
introduced to the problems and promises of statewide educational 
assessment. 

The previous chapter discussed the philosophy of assessment and 
suggested some major reasons for such programs. The chapters follow- 
ing will present some technical aspects of statewide educational as- 
sessment. The purpose for this chapter is to explore the range of 
choices open to state boards with regard to assessment, and to identify 
some of the implications of these choices. Also presented will be a 
number of the effects which may occur as a result of initiating or ex- 
panding a statewide educational assessment program. 

Many but not all implications will be presented. It will remain for 
state board members of the individual states to analyze their own sit- 
uations, and develop their own lists oi : Here is what will happen if we 
. . . This kind of action is vital to effective decision-making by the 
board on any question. 

The Range of Choices 

At this point consideration must be given to the range of choices 
open to state boards; even some investigation of choices apparently 
closed would be wise. Because a statewide educational assessment 
program can be so complex, because the planning and operations al- 
ternatives are so numerous, and because each state board member's 
concept of the program will vary, the choices suggested below arc 
reasonably limited. 

The choices* five in number, merely offer a range of possible actions 
which may be taken by a state board of education with regard to an 
assessment program. The choices arc to: ( 1) ignore it, (2) study it, (3) 
pilot it, (4) start it, and (5) rush it. Let's examine each choice briefly. 

Choice I: Ignore It. This choice will appeal to some who have 
found that if something is ignored, it may go away. What better way 
to save time, money, talk, and frustration than to do nothing, or to 
postpone consideration for a year. 

It is not difficult to find evidence to support this kind of decision. 
The fourth annual Gallup survey of the public's attitudes toward the 
schools iPhi Delta Kap/mn, September 1972) reveals that 56 per cent 



ERIC 



-s- 



of the population placed the chief blame for some children doing 
poorly in school on the children's home life; only six percent blamed 
the schools and twelve percent blamed the teachers. This logic indi- 
cates then that since it isn't the schools' fault that children do poorly, 
why mount an expensive statewide educational assessment program 
of limited value? 

Choice 2: Study ft. This choice is widely used in all kinds of edu- 
cational endeavors. It's a viable, respected position, since on one hand 
the problem is not being ignored, and on the other major resources 
have not yet been committed. 

The time allotted and the nature of the study will indicate possible 
next steps. For example, if there is no time limit set on the study, 
and the charge is not concise, this choice will be almost the same as 
Ignore It. If, however, a definite time limit is set (ranging from six 
months to a year), some resources are provided, and a clear charge 
of what is to be done is given, everyone will know the state board is 
serious about the potential value of statewide educational assessment. 
This choice is also useful when an educational issue of more import is 
at stake: it buys some time. But most imporlanliy, this choice does 
permit development of a comprehensive plan for statewide education- 
al assessment. 

Choice S: Pilot ft. Many advantages are inherent in this approach. 
It is the choice permitting action short of full commitment. A return 
to ignodng assessment can be made if the pilot test is aborted or even 
if it is completed. Piloting a program will probably call for new re- 
sources from the districts or the state. So much is going on in assess- 
ment that federal and foundation risk money to test and develop 
programs is short today. Piloting permits legislators, educators, and 
others to take a look at vi sample program. Uxperience gained can be 
invaluable if a decision is then made to go aliead and develop a pro- 
gram. Legislators anxious for the results, however, will not be con- 
tent to wait until this phase is completed before having a chance to 
make the next decision. School year pilot programs tend to be ex- 
amined at the heiglit of the legislative session, which is usually the 
middle of the pilot program. 

Choice 4: Start It. This is pcriiaps the most common action that 
can be taken, especially considering that quite a number of states are 
already underway with statewide educational assessment programs. It 
presumes there has been some thought (perhaps even extensive study 
or piloting) as to the purposes, procedures, resources, and implica- 
tion?i. This action indicates that the program will be installed in a state 
as a regular, ongoing program, subject to continuous evaluation and 
improvement. It may indicate, /// fact, it should, that the agencies 




controlling education have made a long-term commitment, both 
philosophical and fiscal, to support the program. It also means that if 
the long-range planning is not completed, it soon will be, so that an 
effective, efficient operation will result. Getting an assessment pro- 
gram in this manner, with the limjted resource problem, starting an 
assessment program may mean that some other program may need to 
be curtailed or abolished. 

Choice 5: Rush It, Some will choose rushing, either througli design, 
because of pressures, in order to catch up or get in the lead. Being in 
the lead, of course, may have current prestige advantages along with 
some not-so-great consequences. Getting on the assessment bandwagon 
by this route can lead to some sour notes. Laws mandating a state- 
wide educational assessment program may force a state board to take 
tWs choice; the manner and time frame witliin which the task is ac- 
complished will be important indicators of success. Even counting 
some of the disadvantages of this choice, a subsequent alternative is 
to ease back to a more modest program (similar to one which might 
be operating under Choice 4) and retain the best as the program be- 
comes institutionalized. Bundles of money and lots of help will be 
needed for those states choosing /?«5// //. 

Implications Based on Reasons for Assessment 

The previous chapter considered six philosophical categories of 
reasons why states initate statewide educational assessment pro- 
grams. These were: (1) practicality, (2) curiosity, (3) comparison, (4) 
accountability, (5) amelioration, and (6) decision making. Let's now 
consider educational and political elements, and some rather specific 
motives for assessment, and what the choice of motives miglit imply 
about or bring to the program. Tliis list of motives is merely illustra- 
tive. The list of implications is not exhaustive. 

If this is your motive for a state>%'ide 

assessment program then these may be the implications 

1. To comply with a legislative *Tlie legislature is disenchanted with 
mandate the reporting by the state education 

agency, 

*Ti)e people and the legislators want 
the schools to be better. 
♦Better management of scarce resour- 
ces is imperative. 

2. To have better decision -making in- *Somconc has defined the kinds of 
formation at the state level information needed at the state level. 

*More state control may be indicated. 
'''State decision -makers will read and 
use the new reports. 



- 10- 



3. To have better decision -making in- 
formation at the school district level 

4. To have better information for re- 
porting the status of the schools to 
the public. 



5, All four of the above. 

6. To involve more interested persons 
in the development and evaluation 
of education programs 



7. To make state-by-state comparisons 
of student achievements and atti- 
tudes 



8. To make district -by-district com- 
parisons 



9. To make annual comparisons of the 
achievement levels of students 



10. To obtain dc^u mental ion for dist- 
rict reorganizations 



1 1 . To determine teacher competency 



12. To help iiniprove teacher training 
programs 



*This will help maintain local control. 
^School district information systems 
do not have common elements. 
♦Tlie current status of the schools is 
unknown or p^)orly defined. 

*Tlie public has a riglit to have more 
and better information. 
♦Knowing the problems and promise 
of the schools is important. 
*Tliis may be an impossible task! 
♦Local control lias restricted the flow 
into the districts of new ideas and 
outside concerns. 

♦Knowledgeable participants are more 
hkely to support the programs of the 
schools. 

♦Education sliouldnM be left solely to 
educators. 

♦Our schools are in need of additional 
state or federal aid and this info>rma- 
tion will help justify it. 
♦Tlie state*s students are the leaders 
when compared to others. 
♦The states will use the same assess- 
ment instruments. 

♦Competition for excellence will be 

fostered in the state. 

♦Some local boards and administrators 

aren't doing a good job. 

♦Districts needing additional help 

(financial or service) will be identified. 

♦A very meaningful analysis of school 

programs will be possible. 

♦Teachers will be able to use this 

information. 

♦Student achievement in small districts 
may be below average. 

♦Per student costs in small districts 
are out of sight. 

♦Other reorganization ideas have failed. 
♦Teachers are competent (or incom- 
petent) but thishasn*t been proved. 
♦Performance-based certification is on 
the way. 

♦Teacher training; programs need to be 
more responsive to the needs of the 
schools. 



- 11 - 



*lnforniation to help modify teacher 
training programs wilt be readily avail- 
able. 

13. To identify curriculum areas need- *Cuniculum modifications are not 
ing improvement keeping pace with the clianges re- 
quired today. 

*Thc state is considering a state-man- 
dated curriculum. 

♦Assessment will cover all major cur- 
riculum areas. 

14. To form an important element of a *An accountability program could 
state accountability program duplicate an assessment program. 

♦Accountabibty has been mandated 
and all resources must be focused 
upon it. 

'''Assessment is here to stay. 

15. To be an important basis for a state ^Student achievement indicators have 
PPBS program been missing from the current PPBS 

program. 

*The dev^ilopmenl of a comprehensive 
management information system is 
' underway. 

*Thi$ is a quarterback sneak to get 
resources for the PPBS program. 

16. To replace the state testing program *Districts are going to have to pay for 

their own tests and scoring. 
*In dividual test results are only useful 
as a diagnostic tool in the local school. 
^National tests don't meet our needs. 

17. To replace the state accreditation ^Stricter state standards arc soon to 
program be developed. 

*A move is underway to pull out of 
the regional accrediting association. 
*The state accreditation program is 
still input oriented. 

18. To avoid a national (federal) assess- *The current National Educational 
ment program Assessment Program operated by the 

Education Commission of the States as 
a forerunner to a federal program. 
^Rumors about a national curriculum 
are spreading again. 

19. To keep up with the other stales *Tliere may be a failure to examine 

where the other states ate going. 
*State goals may be ill deluded. 
*lt*s time to take another look at the 
reasons. 

20. To gather information so that im- 
provement can be made in educa- 
tion programs for boys and girls *YoU*re a good man, Charlie Brown! 



Implications !^^^ed on Operating the Program 

With reasons i^t^d motives for assessment selected, five gener.'^l top- 
ics associated wittw operating the program will be explored for impli- 
cations. These are: (•) control and management, (2) resources - time 
and money, (3) people (4) the assessment itself, and (5) reporting. 

Control. The implicatiLns drawn from the type and level of control 
of the assessment program will depend to some extent on how the 
program originated. If the state board of education and the state edu- 
cation agency initiated the program through their own efforts, by ad- 
ministrative action or by proposm.^ enabling legislation, then their 
leadership role is likely to be secure. If some zealous or disgrunfivd 
legislators specified the program in legislation, largely without the ad- 
vice or support of the state education agency, then the likelihood of 
program success could be diminished. The state education agency in 
tliis case may choose barely to meet the Hteral interpretations of the 
state assessment law, figuring the pressure won't be so great in the 
future, 

Tlic less specific the enabling legislation, the more likely the state 
education agency will be able to operate the program successfully. 
For rules and regulations can be added to from year to year, and 
modified to fit the circumstances. Permissive enabling legislation rath- 
er than purely directive legislation may provide just the kind of state 
interest which can appeal to school administrators and local boards 
and bring with it long term success. 

Most state boards of education have statutory duties to (1) over- 
see the pubHc schools, and (2) report annually to the governor and 
general assembly on the condition of the schools. This kind of 
authority has been sufficient for several states to develop assessment 
programs. In such instances, the state board of education, through 
enactment of clear and positive policies or througli adoption of rea- 
sonable rules and regulations, can provide an effective base for devel- 
opment of an assessment program. The obvious implication is a strong 
leadership role for a state board of education. 

Can the state board itself escape being assessed — especially on its 
role in conducting the assessment program? Certainly not, if it ex- 
pects to stay responsive to citizen needs while at the same time dis- 
playing a bit of brinkmanship to lead the w^jy to solutioii of contro- 
versial problems. 

Cunningham, writing for the Designing Education for the Future 
Project, offered these comments about state boards: 
State boards of education need to be examined rather carefully 
and their functions assessed. The vitality of state board of edu- 
cation leadership varies, but on the whole has not evidenced the 



ERLC 



- 13- 



boldness and farsightedness that seems to be hi order. (Implica- 
tions for Education of l^ospective C/ianges in Society J 

Resources. Two major resources needed for any program - time 
and money - will be important factors in the success of the program. 
And these resources must be comniitted in reasonable quantities for 
the long-tenii, for statewide educational assessment is not a one-shot 
effort. A long-range plan with a realistic allotment of time and money 
is vital. 

So the initial expenditure should be for the development of a plan 
for assessment. Expenses for this could range from a possible low of 
SI 5,000 to as much as S50,000, depending upon the circumstances 
within each state. Resources spent in this developmental phase may 
actually save thousands of dollars in operational costs. Four to six 
months of time should probably be allocated to this planning phase. 

Budgetary requirements should be projected for at least three 
years; a five-year projection should be made if at all possible. Annual 
costs will probably vary in each of the initial years, and some of the 
high **start-up" expenses may even occur in the second or third year. 

Average annual costs at the state level only will vary greatly. The 
following chart serves to illustrate the range of possibilities (following 
initial planning expenditures): 





• 


Medium 






Small 


sized 


Large 




state 


state 


state 


SEA personnel 








(pennanent or contract) 


540,000 


S 75,000 


$150,000 


Developmental costs 


1 5,000 


20,000 


25,000 


Data collection, scoring. 








processing 


1 0,000 


50,000 


1 50,000 


Analysis and reporting 


5,000 


7,500 


10,000 


Totals 


S70.000 


5152,500 


$335,000 



Let^j examine sources of the money first, if tlie program will be 
started or maintained with grant funds the implication will be that the 
state isn't really committed to the program as long as soft money is 
the sole support. It is very easy to discontinue a program when out- 
side funds dry up. State funds channeled through the regular budget 
of tJhe state education agency imply that legislators and budget offi- 
cers trust the agency to operate the program in the best interests of all. 
If such funds are made available in separate appropriations or by 
annual amendment of specific legislation, then the legislature may be 
indicating that it wishes to retain most of the control. 

Does statewide imply that a program is state financed, and is not 



- 14- 



f 



costing local dLstricts anything? Not so, as one knows. Even state 
assessment plans which provide some funds to districts for staff time, 
proctors, etc., probably never provide for the full local expense. 

Sometimes state funds for the assessment program can only be 
obtained by curtailing or eliminating other state education programs. 
The obvious implication is that assessment is more important than 
the other programs, or perhaps objectives of the other programs have 
been achieved. 

Time allotments for the statewide educational assessment program, 
or major elements of it, are most difficult to suggest a priori. The com- 
plexity of factors affecting time varies from state to state and with the 
scope of tasks attempted. The relative nature of deadlines could lead 
to Implications that: 

1. one state's assessment program, developed carefully and 
cautiously, with adequate resources and maximum involve- 
ment by the education community, would become fully op* 
erational in four years, or 

2. another state's assessment program, developed less carefully 
and in Jiaste, with inadequate resources and minimum in- 
volvement by the education community, would become fully 
operational in four years. 

Selecting the right deadline will be both a political and professional 
decision, with political and professional implications. Generally 
speaking, however, the legislators and the public will believe too 
much time was taken in developing the program; educators will be- 
lieve too little time was taken. 

People. Now for a look at the people involved in and subject to a 
statewide educational assessment program. Both authorities and com- 
mon sense say that wide involvement is necessary. The definition of 
wide will be up to each slate, as it will depend on the size of the 
state, the number and kind of interested groups, or the number and 
size of the groups which must be induced to become involved. 

Certainly the critics of assessment must be invited to express their 
views in planning sessions. Their concems will provide valuable tem- 
pering to the plan as it is developed. To not involve the critics will 
leave the implication that all decisions have been made (or will be 
made) by the insider advocates and state education agency personnel. 

Wide involvement is called for too, so that more will be better in- 
formed and educated about the statewide educational assessment 
plan. Allotting travel expenses for persons to be involved in planning 
and work sessions will be the best indication that their advice is 
souglit and respected. 

Numerous choices are open for the kind and number of profes- 
sional staff members needed, and all choices depend upon the kind 

- J5- 

ERIC 



of program to be developed and the time and money available. Ex- 
cessive use of out-of-state consultants can erode the leadership posi- 
tion of the state education agency, or cause instate higher education 
staff to feel miffed. There may be no other way, however, if time is 
short through legislative mandate. Assessment specialists brought in 
on a consultant basis should not only assist in the planning, but offer 
any needed training for state and district personnel. 

A state law or board policy directing that a state assessment unit 
be established in the state education agency has advantages and dis- 
advantages. If such specific direction is given, the implication is that 
assessment is here to stay, and so it must be visible and be institution- 
alized. Whether or not the specific unit on assessment is estabUshed, 
the state education agency must have some permanent staff expert 
on this topic who can direct or coordinate the program. (Be prepared 
for NASEAMSEA, the currently fictitious acronym for the National 
Association of State Education Agency Managers of Statewide Edu- 
cational Assessment.) 

Use of knowledgeable school district personnel including teachers 
as paid consultants (where legiU) in the planning and operation of the 
program will provide untold dividends. If it is not legal to pay them, 
then they can be given temporary state assignments by cooperation 
with school districts. The implications: district personnel can do the 
job, and talent will be around to help evaluate and modify the assess- 
ment program in five years. 

The largest group to be involved, students, should not be over- 
looked. This group will probably view assessment as jiist some more 
tests. Perhaps the most appealing reason for some to participate 
willingly is the knowledge that assessment is not a pass-fail task, and 
that the results will not be used to rank individuals. The older students' 
awareness of the purpose of the testing will probably leave most un- 
impressed, for if there are to be changes made in the curriculum or in 
teaching methods because of the new decision-ftta king information, 
the changes will come too late to serve the student who is being 
assessed. 

This particular point deserves further consideration. Will the assess- 
ment reveal the student's performance in terms of today's or tomor- 
row's objectives, or yesterday's? Certainly there are few who would 
want to measure students' abilities to cope with the world of 20 
years ago. But is it possible to devise measures which will portend 
the students' potential 20 years from now? Many talk about educa- 
ting youth for the world of tomorrow; perhaps it is enough to edu- 
cate them for the world of today. The implication is that the assess- 
ment measures of today are just that, and that to assess for tomorrow 



- 16- 



isn't possible. Who knows the future well enougli? 

Certainly a partial answer to this is that the assessment measures 
will be modified, changed, and improved each year. If not, another 
implication arises - that the whole assessment program would tend 
to set the standards for student performance, curriculum offerings, in 
fact the whole education program. The creation of such absolute 
standards could certainly lead to social stratification alien to the tra- 
ditional purposes of education. With the possibility of fairly rigid 
standards througli institutionalization of the program, the school 
districts might routinely prepare students for the assessment. And 
teachers could be trapped into teaching for the test. 

The Assessment. Now to the assessment itself. Assuming the ques- 
tion of Why assess? has been answered, we find the unanswered 
questions to be grouped under four H^'s and an H. The major questions: 
(I ) what is to be assessed . (2) who will be assessed. (3) when will the 
assessment occur, (4) where will it occur and (5) how will it all be op- 
erated. 

Under What is to be assessed choices such as these arise: What is 
the language proficiency of 10-year-olds in our state? Do our seniors 
possess sufficient arithmetic skills to enable them to make change 
and complete a sales slip? What are the problem solving abilities of 
eighth graders? What are the general feelings of sixth graders with 
regard to persons of ethnic or racial backgrounds different from theirs. 
What are the average physical skills of tenth graders? 

The choices of what are to be assessed lead to a major implication: 
If the results indicate deficiencies then action should be taken to 
erase the deficiencies. That's a miglity big order. For since the assess- 
ment is conducted from the state level, and since education is a state 
function, the state, most notably the state board of education, must 
be prepared to respond. There is always the implication too, that 
someone had a pet categorical program in readiness, and will be using 
the assessment to obtain information to justify legislative enactment 
of the program. 

Who will be assessed, where will they be assessed, and when will 
they be assessed are all dependent upon the reason for assessment and 
resources available. If the assessment program is going to examine 
persons not in public schools, e.g., dropouts, graduates, private school 
students, institutionalized youth, then the implication is that those 
in charge really want a statewide assessment, one which will provide 
useful information to other groups and agencies. 

The size of the state sample will also determine if every school 
district will be involved, and to what degree. If the scope of the 
program permits the full range of instruments to be administered in 



ERLC 



- 17- 



every school dislricl. iIrmi the impliciilion is lliat district comparisons 
are fairly inipurtant, ami that they will be made. Some districts will 
certainly seek to have the l^ill range of assessment (even if additional 
testing nuist be done) wiiile others will be content with being only in 
the regular sample, so no district comparisons can be made. 

How the assessment will be operated, and what instruments will bo 
usetl raise a host of implications. Construction and validation of items 
tailored to a state's objectives for education implies that the stale's 
objectives u\v somewhat unique and that it is important to see if they 
are being reached. Use of ready-made norm referenced instruments 
doesn't necessarily indicate the opposite. Indeed, it may demonstrate 
that getting the assessment underway fast is important. The use of 
instruments with national norms will of course permit several kinds 
of comparisons to be made between your state and region and others. 

The ideal assessment program might utili/cboth testing service and 
state-developed norm and criterion reference instruments to measure 
skills and attitudes in the coi^nitii c and t//'/c't7/iv domains. 

Reporting. Reporting and using the results of a statewide education- 
al assessment program must be carefully planned. The results will be 
open to hundreds of interpretations, and excessive use of numbers 
tend.s to make the result.s less meaiiingful to many. The reports must 
be clear and concise, for this is the decision making information so 
eagerly sought. A quick look at the kinds of information and the 
major users will reveal several implications. 

Kind of hiforffkitioti Major User or Aitcliencc 

1. Student Teachers, parents 

2. Scliool and district Local boards, school administrators, 

and state education agency 

3. Regional and state State education agency, public, 

legislators, other states 

The implication is that tlie assessment information would be avail- 
able and useful to all the groups listed in the right hand column. But 
this is not a possibility, imless eveiy student is included in the assess- 
ment. Care in identifying the major user or audiences is essential, ac- 
cording to Womer, who states: 

// /.V hnportcnit to gather inforffuitiofi that will be luaxnnally 
useful to sofueotw. the legislator, the CSSO. superifitetulent. the 
teachers, the pupils, fhe paretit.s. It is tint possible to provide 
fuaxifjuilly useful iiiforiimtiou to each of these audietices . . . 
(Womer, Frank B. Deielopifig a Lari^e Scale AssessnieiU Program.) 
The nature of the assessment report released at the state level will 
have a deluiite effect on subsequent actions. Consider the following 
sharply worded status report on education in a state: 



- IX - 



It is obvious that ncj^lcctful school-coninuttces, incompetent 
teachers, and an indifferent public, may go on degrading each 
other until the noble system of free schools shall be abcmdoned 
by a people, so self-abased as to be unconscious of their abase- 
(Nolle, M. Chester /iH Introduction to School Administra- 
fion: Selected Rcudiui^s. ) 
Theyear was 1837 and the author of the statement above was Horace 
Maiiiu secretary to the Massachusetts State Board of Education. This 
is cited liere also as a reminder that educational assessment is not a 
new onJcavor. Perhaps if the style and vigor of Mann's annual reports 
had been followed by all states througli the years, today's demand 
for assessment programs would be unnecessary. 

Summary 

In this chapter some choices have been presented to state boards 
of eckication regarding the initiation of statewide educational assess- 
ment programs. Reasons for assessment programs and some opera- 
tional aspects of assessment programs were explored and were con- 
sidered. Efforts were made throughout to cite implications of the 
choices and actions which may be made or taken by the state board 
of education and others responsible for the quality of education in a 
st;jU\ That*s what it's all about the betterment of public education 
for boys and girls. Whether this happens may depend upon the de- 
cisions made by sta te boards of education about assessment programs. 



ERIC 



- 19- 



CHAPTER III 



LEARNING 

What Is Learning? 

Man learns. This skill, coupled with the moans orcommunieation 
tVoni one human to another, are the reason^ why men, rather than 
some other species, rule the world. Many aivimals are larger, stronger, 
and more agile than man, but man, througli the power of his intellect 
and his ability to coninumicate with other mortals, prevails. 

The above supporting illustration surely conies as no surprise to 
the reader. In fact the reader most probably has regarded this as a 
self-evident truth for a long time. Why mention ii here? Answer, be- 
cause it's basic. Learning, especially human learning, has been a prime 
psychological area of rigorous investigation since the science of psy- 
chology was founded less than a century ago. 

Psychology as a science represented a merger of bodies of knowl- 
ledge from both the natural sciences (physics and chemistry) and 
philosophy. With this as a background it was not too surprising to 
Inul some psychological investigators (notably l:.L. Tliorndike) con- 
ducting studies into human learning in an attempt to derive some 
basic laws for learning. 

These include: 

Learning Law Definitum 

1. Primacy Thing learned first is best remembered. 

2. Frequency Thing learned most often is best remembered. 

3. Recency Thing learned last is best remembered. 

4. ITfect Man learns what is pleasurable or satistying to 

learn. 

Lincoln's Gettysburg Address provides the reader with some evidence 
that Laws I and 3 could have some relevance. I'or example, if the 
reader is over 30 years of age, chances are that in his secondary school 
literature experiences, he or she was required to commit Lincoln's 
detiysburg Address to memory. You were probably also required to 
store this piece of literature in your memory so it could be instantly 
retrieved for recital to your literature teaciier. Let's see how well the 
law of primacy applies. If provided with the cwc four score and seven 
years ui^o, you can most probably rcspoud our fathers brought forth 
upon this contuwnt a new nation conceived in liberty and dedicated 
to the proposition tliat all nwn are created equaL If you could make 
this response, then Tliorndike would say that the law of primacy was 
operating (things learned first are best remembered). Now, if you 
were provided with the cue . . that we lure liighly resolved that these 
dead shall not Itave died in vain: that the nation shalL under God, liave 

-20- 



ERIC 



a new hirth <>f frccJain. and , . , you will also respond that govcnh 
mcnt of the twaple. hy the people, for the people, shall not perish 
front the earth. Similarly, if you could provide the famous words 
which conclude the dettysburg Address, Thorndike would postulate 
that the learning law of recency had been operating (things learned 
last are best remembered K 

The law of frequency, unfortunately, has been learned too well by 
the people who prepare advertising scripts for television commercials. 
Multiple reinforcement, which the law of frequency postulates, re- 
quires that the television viewer be exposed to a series of pictures of 
men with the wet look and the dry look; more than one housewife 
who has had her Clorox taken away from her; as well as the two 
freaky pla>tic men who have differing amounts of aspirin products in 
the brain and stomach. When these multiple reinforcement commer- 
cials are repeated, there is a great tendency to use this commercial 
television time to go to the refrigerator for a snack, empty ashtrays, 
etc. The television advertiser really doesn't care, because once you 
have seen again the first lew seconds of his thirty-second commercial 
the rest of the commercial and especially the product have been 
brought to ;ictive consciousness, and you will be more prone to pur- 
chase that product when next you are shopping. 

Thorndike \s law of effect states that man has a greater tendency 
to learn when the thing to be learned is perceived as being pleasurable 
or satisfying. The law of effect tends to view man as being hedonistic, 
that is, pleasure seeking, pain avoiding. The extent to which people 
devote their time and talents to hobbies tends to support this law. In- 
school learning is helped when the room temperature, ventilation, 
color of the classroom walls, chalkboard, etc., are comfortable and 
pleasing. Yet these same effect qualities in and of themselves do not 
cause learning to take place- 

Thorndike's laws of learning have not withstood the test of time 
and further psychological inquiry. Modern learning psychologists dis- 
credit the first three laws (primacy, frequency, and recency), and 
only ascribe partial credence to the fourth (etTect). The truth of the 
matter is tliat psychologists do not know what learning is! 

What people know before a particular learning sequence can be 
assessed either verbally, through pencil and paper tests, or by asking 
the person to perform some psycho-motor task. Environmental con- 
ditions predicted to be helpful for learning can be provided. Infor- 
mation can be presented on a logical, experiential, developmental or 
other basis so the learner progresses from the kftown to the unknown. 
After learning has occurred, the student again can be tested to deter- 
mine the extent of his learning, but what happens within a person's 
body (preferably in the gray matter in the human brain) at the instant 

•21 - 

ERLC 



or duration ol loarninj! remains a psycliolojjiL-al mystery. 

INychologisis haw dcllned learning as a change in behaxUir modi- 
tied hy cxpcricHiV, Note that the psycholoiiieal definition is more in 
terms of what learning does rather than what learning is. Since the 
exact nature ol learning remains unknown, most psychologivts tend 
to regard it as a process. In the most recent psychological literature, 
learning is treated as a theory rather than with laws or principles, in 
science lheorie> are neither true nor false but gain value in terms of 
their usefulness and predictiveness. 

At this point a stale school board member may be somewliat dis- 
concerted to discover that public education, dedicated to the business 
of learning, is not onlv founded on unprecise science but also on a 
somewhat uncertain art. School professionals have grown comtoriable 
operating with this uncertainty. They merely assume validity for 
their theories and procedures, often pointing to traditions as veri- 
licilN . Public acceplamce of these traditions aids in their continuity, 
although this accepi;mce is now being challenged. Regardless of how 
state board members view their present roles with respect to student 
learning In their slates, two facts emerge: (1 ) at this time no one or 
group ol humans knows many of the answers of how students learn; 
and (2) stale board members have the opportunity to intluence stu- 
dent learning in the stales, if and only if, they are inforn^ed of the 
most up-lo-ilaie trends in educational studies of learning and arc 
willing to be agents for change in the public schools. 

Kinds (»f Learning 

In this section the kinds of learning will be examined together with 
educational applications of each. I:ssentially three kinds of student 
learning are ot interest to educators. These are: 

A///t/ f)f Learning Dcjlnition 

1 . C\)gniii ve Learning which deals with the recall or rccog- 

niiion of knowledge and r/ie derelopfnent 
of intellectual abilities and skills. 

2. Aft cclive Learning which describes changes in interest, 

attitudes, and values, and the dereloptnent 
of at^preciations and adequate adjustrnent, 

3. Psycho-molor Learning which deals with the nianif)ulatiie 

or niotf^r-skill area. 

At the risk of oversimpiificalioti it might be said that more than 
one kind of learning is involved in the student acquisition of any 
school (or subject matter) related concept. For example, the cog- 
nitive component (kindh)!' learning would be involved in having each 
student understand the number facts of nmlliplication of single digit 
luujfibers one through nine. Inlluencing this cognitive component 



would be the studenfs ulliludes, appreciations, or interest in this 
subject matter. Manipulative skills, such as those taught in vocational 
or home economics courses, along with those skills acquired in gym- 
nastics or other physical education courses certainly would involve 
psycho-motor components in addition to either a cognitive or alTec- 
tive component or both. 

In the more traditional approaches in vogue during the first hall" of 
the twentieth century, learning in public education appeared to oper- 
ate on the sponge theory. In the sponge theory of learning, the teach- 
er, as a master learner, was regarded as a very pHable sponge filled 
with the tluid of learning. Students would have their sponges moist- 
ened by the liqi»id of learning from the teacher. From a 1970 per- 
spective the descr^inion of the sponge theory sounds somewhat ob- 
tuse - and it was. Under the sponge theory, learners were relegated 
to a subservient poNition in the teacher-learning process. Tlie student 
was not encouraged to become an independent, inquisitive learner 
able to pursue learning which may not be witliin the confines of the 
teacher's cor^>f experiences. 

Other early procedures assumed the mind was like a muscle; if 
exei vised it would grow smarter in capacity. Another theory, one 
still held widely by the uninformed, assumed the mind lud abstract 
capacities such as logic and reasoning which could be developed in- 
dependently and transferred to other disciplines. All three of these 
theories dominated curriculum construction and brouglit about a 
tedious, repetitious, elitist, and autocratic school program. 

More modern learning methodologies are based upon a series of 
educational assumptions: 

1 . Each learner is a unique human being possessing some of the 
common competencies of his peers in addition to his personal 
talents. 

2. Each learner should be afforded opportunities to develop educa- 
tionally to the limits of his or her varied abilities. 

3. Each student should be afforded a variety of approaches for 
learning: which best matches the student's learning style. 

The learnirtg style referred to in the third assumption is related to 
the specific approach which each learner finds most advantageous to 
his learning. The learning style not only varies from child to child, but 
also from one subject matter to another for the same child. Some 
learning styles are visual only (reading and seeing),others auditory, 
while still others require some combination for the most effective 
learning. 

Cognitive Learning. The content of most subject matter areas taught 
in public schools can most appropriately be referred to as cognitive 
learning. Levels of cognitive learning include all the knowledge, un- 

O -23- 

ERLC 



derstanding, application, analysis, synthesis and evaluation of the 
facts, principles or concepts to be learned in a ^ven unit, quarter, or 
semester in a given course. According to the Taxonomy of Educatio9h 
al Objectives, Cognitive Domain, these levels are defined as follows: 
Level of Cognitive Learning Definition 



I . Knowledge 



2. Comprehension 



3. Application 



4. Analysis 



5. Synthesis 



6. Evaluation 



the recxill of specifies and universals, 
the recall of methods and processes, 
or the recall of a pattern, stride ture, 
or setting 

understanding or apprehension such 
timt the individual knows what is 
being communicated and can make 
use of the material being communi- 
cated. (Ability to visualize the rela- 
tionship between and among con- 
cepts.) 

the use of abstracts in imrticular 
and concrete situations. Tlie abstrac- 
tions may be in the form of general 
ideas, rules or procedures, or gencr- 
aiized methods. The abstractions 
may also be technical principles, 
ideas, and theories which must be 
remembered and applied f problem 
solving). 

The breakdown of a communication 
into its constituent elements or /larts 
such that the relative hierarchy of 
ideas is made clear and/or the rela- 
tions between the ideas expressed 
are made explicit (deduction). 
The putting together of elements 
and parts so as tf) form a whole. This 
involves the process of working with 
pieces, parts, elements^ etc., and ar- 
ranging them and combining them 
in such a u'ay as to constitute a 
pattern or structure not clearly 
there before (induction). 
Judgments about the value of mater- 
ial and methods for given purpose. 
Quantitative and qualitative judgr 
ments aboiit't the extent to which 
nwterials and methods satisfy cri- 



-24- 



tcria. Use of a standard of appraisal. 
Tlie criteria may be those deter- 
mined by the studmts or those 
which are given to them. 
Those levels of cx>gnitive learning are usually contained in the edu- 
cational objectives developed by the teacher for a given lesson, for a 
unit, or for a course. Depending upon the complexity of the concepts, 
or procedures to be acquired by the student, the teacher will vary the 
methodology of presenting the informiition to be learned. For low 
level knowledge the teacher perhaps will present the information 
verbally to a group in class. The teacher may choose to ehiploy ex- 
amples, comparisions, statistics or testimony of authorities in support 
of the main ideas. Tliis support material could also take the form of a 
film strip, motion picture or audio tape in' addition to the teacher vo- 
calization, [f the teacher were attempting to indavidualize the student 
learning in her class, she might chocrv* to present the class with a series 
of learning alternatives to acquire the concepts. Using this methodol- 
ogy, the teacher is the manager of the learning situation. The teacher 
works closety with each learner either singly or in small groups when 
more than one learner selects the same learning approach and chooses 
to work with other students. 

Teacher methodologies tend to change from lesson to lesson, course 
to course, and from the early elementary to the higher secondary 
grades depending upon: the complexity of the concepts to be learned; 
the sophistication of the student; and the available support (audio 
visual) materials which the school di^vtrict can bring to bear to assist 
the teacher. Most learning content in grades K through 1 2 represents 
cognitive learning. 

Affective Learning. The content of learning carries with it, aside 
from the mental gymnastics of the cognitive domain, an affeciive 
component in the mind of the learner. Each learner tends to like, to 
dislike, or to be ambivalent, to all that he learns. Additionally, society 
demands that succeeding generations acquire positive attitudes toward 
law and order, democracy, and moral and ethical values. Levels of af- 
fective learning include: receiving; responding; valuing; organization 
of a value; and, characterization by a value. Cn the taxonomy of 
Educational Objectives: A ffective Dofnain, these levels are described 
as follows: 

Affective Learning Level Description 

I . Receiving The learner he sensitized to the exist- 

ence of certain phenomena and stimu- 
li: tliat is, that he be willing to receive 
or attend to them. 



ERLC 



-25- 



2. Responding This is the category that many teachers 

will best describe their interest objec- 
tives .... indicates the desire that a 
child become sufficiently inmlved in 
or ciPmnUtted tif a subject, phenom- 
enon, </r activity that he will seek it 
(Httand S/ain satisjacfi<m fnon working 
w ith it or engaging in it. 
Valuing . . . tfiat a thing, phenomenon, or be- 

havior has worth. This abstract concept 
<// W'ffrth is in part a result of the indi- 
vidual 's nw n valuing or asscsstuenf. but 
it is much more a social product that 
has been slowly inter mlized or ac- 
cepted and has come to he used by 
the student as his own criterion of 
worth. 

4. Organiz-ation (of a value ) As the learner succHfssively internalizes 

values, he encounters situations for 
which more than one valt4e is relevant. 
Thus necessity arises for (a) the or- 
ganization of the values into a system. 
( b) the determination of the interre- 
lationships among timm. and He) the 
establishment of the dominant and per- 
vasive ones, 

5, Characterisation . . . The values already hare a i^lace in 

by a value the individual's value hierarchy, arc 

organized into some kind of internally 
consistent system, have controlled the 
behavior of the individual for a suf- 
ficient time that he has adapted to be- 
having this w*ay: and an evocation of 
the behavior no longer arouses emotion 
or affect when the individual is threat- 
ened or challenged. 
Teachers are usually involved with affective learning up to and in- 
cluding Taxonomy level three (valuing). Professional testing companies 
are capable of developing tests to include levels four and ilve. The 
affective component of the cognitive learning content is of great in- 
terest to teachers, since students tend to learn better those things 
which they like or have positive attitudes toward. Courses in civics, 
group discussions on what would you do if, situations in public 
affairs topics or the student's comments relative to the practical ap- 

-26- 



plication of vocational or home economics topics to real life situa* 
tions similarly need to be known and capitalized upon by the teach- 
er. The attitudes a student leaves school with are the result of many 
years of attitude and value formulation. Teachers and school systems 
can seek to foster positive attitudes on the part of their students if 
the teachers or other school officials are willing to discover student 
feelings and attitudes at all points along the way in the educative 
process. School personnel must also be willing to develop corrective 
strategies and approaches to overcome those educational practices 
which unnecess^arily provoke unfavorable student feelings, emotions 
and values. 

ftycho-Motor Learning. Psycho-motor learning involves those skills, 
procedures or operations which involve some manipulation of 
objects or the body itself. These learning activities are most often as- 
sociated with handwriting, speech, physical education, industrial arts, 
and technical courses. Although the manipulative component of psy- 
cho-motor skills is assessed by some sort of performance indicators, 
most teachers tend to regard the course content as they regard other 
cognitive student learnings. Thus, while a demonstration-performance 
methodology (wherein the teacher explains and deirtonstrales the 
skill, the student perform* under teacher supervision, and the student 
performance is evaluated ;)f5!S to accuracy of procedure and outcome) 
is fairly common to psycho-motor aspects of physical or vocational 
education courses, teacherscontinue to depend primarily on cognitive 
processes to ascertain overall student learning. 

As teachers, curriculum specialists and school administrators isolate 
and group concepts to be learned into lessons, units and courses of 
study, objectives identifying these competencies are written for use 
by the classroom teacher. The specific methodology each teacher will 
employ to assist students in learning these concepts will be partially 
dictated by the nature of the content of each educational objective. 
The teaching of methodology also is influenced by the specific nature 
of the learning environment. Self-contained classrooms, team teach- 
ing, individualized instruction, departmental organization (usually at 
the secondary level), modular scheduling, etc., each have methodologi- 
cal advantages and limitations for the teacher. 

The remaining sections of this monograph will be devoted to: the 
measurement and evaluation of student learning; the problem of 
appropriate educational criteria: some suggestions for reporting the 
outcomes of evaluating student learning; and some rules of thumb 
which state school board members might employ to assist them in 
the evaluation of student learning resulting from curricula under 
their purview. 



ERLC 



-27- 



CHAPTER IV 



MEASUREMENT 

McasiirLMiKMit is the art of qiunlirying somclliini:. I-.L. Tliorndike 
in the pioneer d-u\s of Ilie testing nuneiiKMit the United Stales was 
known to liuve made the very modest eomnient, **ir something exists, 
it exists in a qiiantily. It' it exists in a (juantity, I can measure it/' 
Measurement may tjke one of several forms. When a person visits his 
physician he is weighed, has his height lakeiu has his blood pressure 
taken. Me has his urine and hlood analyzed for specillc gravity and 
composition. TIk* doctor then records these measurements, and by 
comparing these recordings against some standards, judges (cvuUiates) 
the condition of the patient's health. 

Tlie same is true in educational measurement. Tlie purpose of edu- 
cational measurement is to assess whether the student has attained 
(learned) the various objectives of the curricula (for practical pur- 
poses in education the words ''measurement" and *'assessment" are 
synonymous.) 

What kinds of measurement do educators employ in their trade? 
Certainly many are used. Tfie nature and form of the measures vary 
according to the requirements of educational objectives. Tlie physical 
education teacher may employ a modified check list which he anno- 
tates every time a student masters a given feat in gymnastics. Another 
physical education teacher may record the number of pushups, pull- 
ups, situps, etc. and measure the progress each student makes through- 
out the course. 

Tlie social studies teacher may well employ a sociogram to measure 
the group dynamics of her class. Through this technique, the stars 
(ver>^ popular students), cliques, and the isolates (students who don't 
have friends in the class) are determined. The social studies teacher 
also may employ an affective questionnaire to measure the class feel- 
ings toward race, government, and social issues. 

The English teacher tends to employ student essays to measure both 
the content which students have learned and also the students' ability 
to organize l))eirtlioug)its coherently in written communications. The 
English teacher also uses the verbal questions and answers teclmique 
with students in the classroom to measure whether the students ap- 
pear to be learning. 

Problem solving formats are employed by virtually every teacher to 
ascertain whether students have mastered a given set of operations. 
Tliis is true in the demonstration of mathematics procedures as it is 
also true in logical presentation of points of view in a Jescrlhe and 
compare type of essay question in Civics. 



ERLC 



-28- 



Virtually all schools employ objective tests (cither nationally or 
locally prepared) to measure the student learning in such areas as 
reading speed and comprehension, math concepts and computation, 
English us;:ge. History, Social Studies, etc. 

Tliese various educational measures are recorded by the teacher on 
either an absolute scale (percentage or letter-grade > or on a relative 
basis (a student's position with respect to some national or regional 
norm as the (>Oth percentile). 



Usually tour kinds of educational measures are found in public 
school programs, (1 ) intelligence tests, (2) achievement tests, (3) apti- 
tude tests and (4) diagnostic tests. 

hitelligence Tests: Tliese instruments purport to measure the 
student's intellectual capacity. The two major kinds of intelligence 
tests are (I ) group and (2) individual. As the labels signify, a group 
intelligence test can be administered'to a class under the guidance of 
a teacher. An individual intelligence test (Stanford-Binet, Weehsler) is 
administered to a student by a psychologist on a one-to-one basis. 
Scores resulting t'rom these tests vary. Some tests yield a total score 
only, while others yield scores for verbal, quantitative and total. The 
obtained score then is converted into an intelligence quotient. L.M. 
Terman of Stanford University in 1916 first described the Intelligence 
Quotient as the ratio between mental age and chronological age 



^'Q = Knolo^^^^ age ^ P^V^'liological studies of intelligence 



tests tend to reveal that they are culturally and ethnically biased in 
favor of the white, middle class American. Intelligence tests are often 
used to predict a rate of expected student growth. The child whose 
10 is 100 usually progresses 10 months in grade achievement (one 
school year) per year, the 80 IQ youngster typically will progress 8 
months in grade achievement, etc. 

Achievement Tests: Achievement tests are designed to measure a 
student's educational progress in comparison with other similar stu- 
dents nationwide. Such measures as G///7w/z/a Achievement Test, Iowa 
Test (ff Basic Skills. Metropolitan Achievement Test, Sequential Tests 
of Educational Progress, Stanford Achievement Test, Test of Aca- 
deniic Progress, Wide Range Achievement Test, and others, contain 
subtests which assess reading, English Usage xind composition. Math, 
History, Social Studies, etc. The subtest scores are usually converted 
into grade equivalent scores. A student with a grade equivalent of 5.1 
is said to be functioning at the first month (October) of grade 5. (The 
decimal portion of the scale divides the 10-month school year such 



Kinds of Educational Measurement 




-29- 



that September is 0, October is 1 , November is 2, etc., until June is 
9). The basis for these grade equivalent scores stems from an analysis 
of a nationa/ sample of students. Tests such as these will undoubtedly 
backbone any statewide assessments. 

Aptitude Tests: Aptitude tests arc designed to measure a student's 
propensity for a given course of study (say geometry) or an occupa- 
tional field of work. Tlie tests are designed such that a person scoring 
high on the test would also be likely to be successful either in a given 
course or field of work. Aptitude tests are prerequisites for entrance 
into most medical, dental, law, engineering, and teaching schools. 
Tests such as the Miller Analogies Test and the Graduate Record 
Examination are required for graduate study. TiMi Strong Vocational 
Interest Blank and Kuder Preference Record provide valuable informa- 
tion which students use (with educational guidance) to select the next 
steps in their career planning. 

Diagnostic Tests: Diagnostic tests are employed to discover the 
nature and degree of student learning difficulty. Standardized tests 
for reading and math diagnosis are available for the normal student 
who experiences difficulty in one or both of these areas. For the 
special education youngster whose IQis less than 90, a whole battery 
of standardized instruments is designed to diagnose the specific learn- 
ing dysfunction. Among these are: 
Bender - Gestalt; 

Frostig Test of Visual Perception; 

Wepman Test of Auditory Discrimination; and 

Illinois Test of Psycholinguistic Abilities. 
It should be noted each kind of measuring instrument just described 
was designed for certain educational purposes. The procedures to 
validate each of these measures to gather large groups of representa- 
tive students, and to derive the interpretation of the test results, are 
both rigorous and exacting. It is toward an examination of the char- 
acteristics of a good measuring device that we now turn our attention. 

Characteristics of a Good Measuring Device 

An effective measuring device should have the following tlve char- 
acteristics: (1) validity, (2) reliability, (3) objectivity, (4) compre- 
hensiveness, and (5) differentiation. 

Validity: A test is valid if it measures what it was intended to 
measure and nothing else. The American Psychological Association's 
Technical Recommendation for Psychological Tests and Diagnostic 
Teclmiques (1954) identifies four categories of validity. These are 
content, predictive, concurrent, and construct validity. 

Content Validity: Content validity involves essentially the 

-30- 



systematic exaininatiow of the test content to determine whether 
it covers a representative sainple of the behavior domain to be 
measured (Anastasi, 1966). At first glance the determination of 
content validity would seem to be a rather easy task. The hooker 
comes when one has to say that the (juestions cover all major as- 
pects of the course content in the appropriate proportion. Some 
areas of learning lend themselves quite readily to objective ques- 
tions, e.g., true-False, multiple choice, matching. It is considerably 
difficult to generate questions for concepts which are abstract and 
global. Another concern is, does the test (usually achievement) 
measure the objectives of the instructor as well as the objectives 
for subject matter? To check statistically for content validity it 
may be necessary to prepare parallel forms of the test. Using this 
procedure one form is given as a pretest, and the second as the post- 
test. The gain made in score provides some evidence of its content 
validity. 

Predictive Validity: Predictive validity indicates the effectiveness 
of a test in predicting some future outcome (Anastasi, 1966). A 
student's scores are checked against some measure of success on a 
given job or course of study. The on the job performance is usually 
regarded as the criterion. Predictive validity of a test is required 
before the instrument can be used as a screening device for hiring 
job applicants, selecting students for admission to college or pro- 
fessional schools, etc. 

Construct Validity: llie construct validity of a test is the extent 
to which the test may be said to measure a theoretical construct or 
trait, Lxamples of such constructs are intelligence, mechanical com- 
prehension, verbal Jluency, speed of walking, neuroticism, and 
anxiety (Anastasi. 1966). Construct validity of a test can be ascer- 
tained over time. For example, with respect to intelligence testing, 
did the ratio of mental age to chronological age remain stable as 
the student got older? Other statistical techniques employed to 
assess the construct validity of a test are: (1 ) the correlation of this 
test with scores on already validated tests wliich measure the same 
traits: and. (2) factor analysis, which groups the test items according 
to traits or factors being measured. 

Reliability: The reliability of a test refers to the consistency of 
scores obtained l)y the same uidividuals on different occasions or 
with different sets of equivalent items (Anastasi, 1966). The two 
major types of reliability estimates are determined by either test- 
retest correlation or by an inlernai consistency index. In test-retest 
reliability (coefficient of stability), the test is readministered to the 
same students after a time interval and a correlational coefficient 
is computed. In computing an internal consistency reliability in- 

-31 - 

ERIC 



(ciK'lliciciil o\' C(|iiivulcncc) the test is ilividcd in half, by 
nuiking i\ siihlcsl ol' oikl-iuinihLTcd items jiul aiiolhcr subtest ol* 
cvcii-nunibLMcd itL'ins or by making subtests of the first and second 
lialves ol' tlie test, and correlation for one lialforthe test, is com- 
puted. 

Objectivity: A test has ohjeclivity if the same score is assigned 
no matter wht) corrects the test. Thus. Irue-lalse. nudtiple-choice, 
and matching items lend themselves to objectivity : essay items do 
not. 

Comprehensiveness: A test is comprehensive if the test items 
adeiiuately sample the I'lill range ol' objectives and subject matter 
in a curriculinn. Carel'id attention needs to be paid, lest trivial in- 
Ibrmation items accoinit lor more than their proportionate share 
of the test. So while the tesi i.s designed paying carel'ul attention 
that all aspects of the educational objectives and course content 
are included in an instrinnent. it must be assured that the niunber 
of items pertaining to a given content also be consistent with that 
content's importance. 

DitTerentiation: Items on a measuring instrument have dilTereiv 
lialion when the brighter students get the item correct, and the 
less intellectually talented students get the item wrong. The same 
thing can be said for the test as a whole. In item analysis, the ease 
index is the ratio of the number of students who scored correctly 
on the item and the total number of students who took the test 

(M.I. = ol* students correct/total ^ oT students) 

Professional test writers seek items having ease indices in the 40 - 
.(^0 range lor achievement tests. It should be noted that dilTerenti- 
ation may not always be desired, especially lor locally prepared 
achievement tests. Some educational objectives require mastery 
level learning by all students. The major concept in. a course, for 
example, might rcijuire student mastery before extensions t)f these 
concepts can be learned. When local achievement tests are designed, 
items measuring the student learning o{ rnasfcrv ma ferial should 
have ease indices .S5 and higher. So, a note of caution is suggested 
when critiquing a test for discrimination. The critiquer must ask 
Docs (lie ohji'ctivc that tills (/uc.stlo/i i.s written ai^alnst rciiuire 
mastery Icarninii by the student/ If the answer is >V.v then an E.I. 
of .85 or more is to be expected. If the answer is Xo or not neces- 
sarily then is tlie .40 - .70 range are to be anticipated. 

It sliould be noted that the characteristics of good measuring 
devices apply not only to commercially developed instruments, 
but also to those generated locally. Tlie educator's purpose for 
measurement, as well as corresponding interpretations, are the 



topics to vvhicli \vc now lurn aiul will coniplclo our cxaniinalion 
of educational measurement, 

Norm-Referenced vs. Criterion-Referenced Measuremenl: liic 
standardi/ed ineasurijii! instruments descrihed thus far in our section 
on educational measurement are best descrihed as norm-rererenced 
instruments. The norni-rereivnced lahel refers to the purpose of devel- 
oping the instrument in the first place, and the interpretations which 
can be made after a student has tuken the test. Take, for example, 
the Stanford Achievement Test. The Stanford Achievement Test has 
been developed and validated to reflect what the test developers felt 
were common elements of national student learninii for reading. 
Hnglish, science, social studies and math for pjspective elementary 
grade levels. The various sub-scores attained hy a student are trans- 
formed into percentiles or grade equivaleut scores so a given student's 
progress can be compared tc^ the attaimnent of many other similar 
students nationally. This practice has the advantages of:( 1 ) being able 
to see how individual students rank on a natiofial standard: {!) getting 
a feehng of the relationship of the local curricula to that which is 
being taught nationally to similar students: and,(3l providing some 
accountability information Xo the local public which is subsidizing 
public education. Norm-referenced instruments may or may not be 
relevant to either the number of objectives or the relative emphasis 
of the various educational programs. 

How then does an educational administrator assure that the testing 
program measures in fact the relevant educational objectives of his 
programs? At least two alternative sokuams appear possible: (1 )the 
local school district could adopt the objectives of the norm-referenced 
instrument, and go with a national curriculum which is not too ten- 
able; or (2 ) they could develop their own local achievement test which 
measured their objectives. To do the latter, the administrator must 
resort to some form of a criterion-referenced testing program. A 
criterion-referenced instrument is one whose questions are specifically 
tailored to measure the educational objectives of the local school 
program, and a level of attainment is specified. The criterion-refer- 
enced testing movement has been gaining momentum in American 
education since the mid-1 ^^^Os. In criterion-referenced testing it is not 
only possible for a student to achieve a perfect score, it is desirable. 
Criterion-referenced testing (where the local goals and objectives arc 
the criteria) provides an ideal solution to measuring student learning 
which is relevant to the local educational program, but this solution 
is not without its problems. 

First, the construction of measuring instruments is a highly special- 
ized skill which few local educators have had either academic training 




-33 - 



lo dcvdop or the experience and expertise to analyze and norm the 
reriUllK. 

Second, some o\ the national apostles, disciples, and followers of 
the criterion-referenced movement reject the necessity for reliabiHty 
and dilYerentiation as necessary prerequisites for criterion-referenced 
instruments under the guise that these instruments should reflect 
mastery learning. This second prohlcin will be elaborated upon in the 
criteria section for reliability, but for now, it suffices to say that mak- 
ing measuring instruments relevant to local curricula, in no way means 
that one should discard the time-honored and validated requisites 
for good test construction. The notion that criterion-referenced 
measuring instruments should measure inastcrv'lcvcl objectives is 
similarly absurd. The American public educational program is based 
upon the notion that each child has the opportunity to iearn to the 
extent of his or her talents. Certainly the notion of individual stu- 
dent differences would dictate that the criterion-referenced measur- 
ing instrument would provide challenge for the gifted child, or the 
child wliose talent is centered in a given idiosyncracy (say, math, 
llnglish, social studies, etc,) also would have his learning measured. 

Third, tile local school district, even after it had developed and 
validated its criterion-referenced instruments, still requires some data 
relative to tiie comparability of its findings, with other school districts 
in the state or the region. This problem makes a strong case for 
having district-to-district communication operating within a state or a 
geographic region of the country. Each student completing a grade 
K-12 public educational program matriculates into the next step in 
his preparation for a career. Whether the next step for the student is 
college, community college, technical training, or the world of wofk, 
grades K-12 learning experiences should have been equally relevant 
for such a transition. 

Thus, wliile criterion-referenced measurement has much to recom- 
mend it, a matrix of educational problems must be overcome to assure 
that it is being employed effectively. The need for local districts to 
develop this expertise in criterion-referenced measurement was rein- 
forced in the present session of Congress when a bill requiring criteri- 
on-referenced testing was introduced {HR-69) to justify a district's 
entitlement to Title I, ESEA funds beginning with FY-75. If this re- 
quirement is in the final law, considerable progress will be made in 
criterion-referenced measurement for school districts vying for future 
Title I funds.' 

Needs Assessment: Twenty-three of the fifty states have educa- 
tional accountability legislation. Most statutes require that local 
school districts design their accountability programs from data gath- 
ered by districtwide educational needs assessment. A needs assessment 



-34- 



instrument is one which is designed to measure the school district's 
educational needs as perceived by educators, students, parents, com- 
merce and business, the trades, and members of the public-at-large. 
The instrument itseh* is typically designed to measure objectively the 
content of the various educational programs in existence in the dis- 
trict. Respondents also have the op|X)rtunity to communicate their 
desires tor the direction the educational program should take as well 
as their critique of the existing program. Local educators then synthe- 
size these data as new goals and objectives for the district are rede- 
signed. Within the recent past, needs assessment has provided educa- 
tors with a mode of communication with the public which has here- 
tofore not been available for the modification of local educational 
curricula. In Colorado, for example, virtually all of the 181 school 
districts conducted needs assessment as an integral part of their local 
accountability program, and have adjusted the planning of education- 
al programs accordingly. These needs a*?sessment measuring devices 
have provided local educators with an opportunity to optimize their 
educational offerings to coincide with the manifest desires of the 
public which they serve. 

National Assessment: In the mid-1960s, the Education Commission 
of the States, a non-profit organization representing over 40 states 
and territories, established a subsidiary, the National Assessment of 
Educational Progress (NAEP). NAEP was designed to measure the 
educational attainment of a sample of the population ages 9, 13, 17 
and 26-35 across a range of educational competencies in an effort to 
assist local educators in appraising the effectiveness of educational 
programs on both a national and a regional basis. Assessment instru- 
ments are and will be designed to measure the following: 
Cycle I 

1969- 70 Citizenship, Science, Writing 

1970- 71 Reading, Literature 

1971- 72 Social Studies, Music 

1972- 73 Math, Science 

1973- 74 Writing, Career and Occupational Development 

1974- 75 Art, Citizenship 
Cycle II 

1975- 76 Reading, Literature 

1976- 77 Music, Social Studies 

1977- 78 Math, Science 

1978- 79 Writing, Career and Occupational Development 
1980-81 Reading, Literature 

(Compact, Feb. 1972, Vol. 6, No. I, p. 17) 



ERIC 



-35- 



Tlie first tew reports generated by NAEP have resulted in mixed 
reactions by local educators. NAEP is solely an assessment (measure- 
ment) activity. Tlie purpose was to assess (measure) the knowledge, 
understanding, abilities, and feelings of learners toward the subject 
matters in the schedule described above. 

Some local educators have reacted negatively to NAEP reports be- 
cause: (1 ) National and regional findings only were reported (NAEP 
never promised more than this); (2) the age categories represented 
criteria which tended to confuse the usual grade level data with which 
the local educators were more familiar: and, (3) the NAEP reports 
contained few, if any, analyses resulting from the assessment other 
than the presentation of data. (In all fairness to NAEP, assessment 
rather than evaluation (judgment based on assessment | is their r^/sofi 
c/Vrre, and the local educator's critique should be, but isn't, from 
that frame of reference.) It lihould also be noted that educational 
agencies more regional or national in scope (notably the U.S. Office 
of Education, NAEP's principal benefactor) have found the NAEP 
results to be extremely valuable. 

Local educators are used to having the data measuring students' 
achievement presented in some sort of perspective. They expect that 
these data will provide them with a series of specific instances where 
their educational programs are succeeding, and where they are not. 
Usually, local school district administrative staff data are not attuned 
to the educational objective level (by subject matter) documentation 
reported in NAEP studies. Thus, while it is interesting for the local 
educational administrator to know that 62% of the students in his 
region do not know basic tenets of social studies measured on the 
1971-72 NAEP instrument, he must still go out and develop his own 
social studies instrument to see if his students lack the^ competencies, 
even though his district might liave been subject in the NAEP 1971- 
72 study. This, most districts are neither staffed nor willing to do. 

NAEP's reporting on a chronological age basis of 9, 13 or 1 7 years 
old similarly has caused problems in local educational agencies. Since 
the beginning of the testing movement in the country, students have 
been grouped on a grade level basis. While it is recognized that for a 
given grade level, students of differing chronological age are present, 
the learning achievement of these differing chronological age students 
could be measured because they had been exposed to a comparable 
level of course content. NAEP, in their reports, categorize students 
according to chronological age and report their (Xhc students*) learn- 
ing in a given subject matter area by the age criterion. Local educators 
are entitled to feel that the NAEP findings present a confounding 
variable to data which they already have, because this is what has 
occurred. Again, the local district must assess over multiple grade 



-36- 



levels to determine wliether the NAEP findings pertain to their stu- 
dentj>jged9, 13 and 17, 

Hie khird LEA (local educational agency) objection to the NAHP 
reports relates to the unwillingness of NAEP to analyze their data 
locally and make recommendations. As a national subsidiary of the 
Education Commission of the States, NAEP has felt a need to remain 
apolitical in reporting their data on a regional and a national basis. 
As time goes on, and NAEP, as established and functioning as a viable^ 
measurable component of American public education, continues to 
gain the national, regional and state recognition for their expertise in 
assessing the state of public education, perhaps, analysis and recom- 
mendations can approprbtely be made on a state and local basis. 

NAEP has made a commendable contribution to measurement of 
student learning in public education. The foregoing dialectic was in 
no way an effort to impugn what NAEP has attempted to do to assist 
in assessing public education nationwide. The reporting parameters 
have been different, but if NAEP wants to remain in the fore, it must 
either: (1 ) provide local educators with the background to cope with 
and interpret NAEP reports locally; or (2) revise their reporting for-# 
mat so they (NAEP) provide local interpretation and suggestions for 
improvement. Although neither the purpose or the operational pro- 
cedures of NAEP require this level of detailed analysis, local educa- 
tors need this level of detail to modify their existing educational 
programs. At present, an impasse appears to exist between NAEP 
and local educational agencies. 



ERIC 



-37- 



CHAPTER V 



Evaluation 

Evaluation is judgment based upon criteria. This classic definition 
ofeducationalevaluation has withstood the test of over a half-century 
of use in the field of public education. Educational evaluation is a 
terminal activity associated with educational measurement. Evalua- 
tion is souglit to answer the question Sr> wimt? to data. Asking the 
simple question S(j what? may be a terse verbal overture. The appro- 
priate reply when applied to a diversity of educational applications, 
is answered in terms of statistical probabilities. Educational measures 
are not absolute truth but are merely contrasted against the likelihood 
of chance. 

Virtually every human trait (intelligence, height, weight, etc.) when 
measured is distributed according to the normal probability curve. 
The normal curve has certain characteristics as shown in Figure 1 . 



FIGURE I 
The Normal Curve 



J 

{ 

j Mean 

I 
I 
I 




An inspection of Figure 1 reveals that the curve peaks out around 
the midway point along the horizontal axis. When the curve is truly 
nonmL the point on the horizontal scale vertical to the highest point 
on the curve describes three statistical measures of central tendency. 
They are: 
Measure of 

Central Tendency Definition 
1 . Arithmetic Mean The sum of all the scores along the curve 

divided by the number of observations. 



-38- 



1. Median Tlie midpoint of the curve such that di- 

viJes the normal curve into iwo equal 
areas. 

3. Mode Tliat point along the horizontal axis at 

which the normal curve reaches its high- 
est point. 

The mean, incjian and mode arc employed iis the measure of 
central tendency in most educational statistical computations. The 
mean is used as a measure of central tendency when the scores are 
normally distributed and the quantity being measured is being re- 
ported on an interval scale. (An interval scale is one which has equal 
units of measure at all points along the scale, i.e.<» weight in pounds, 
height in inches, scores, raw scores on a test, etc., each are examples 
of interval measurement.! 

The median is used as a measure of central tendency when the 
shape of the distribution is influenced by one or more extreme scores. 
Under such circumstances the middle score rather than the mean 
would be the most appropriate measure of central tendency. The 
following example illustrates the appropriate use <o( the median 
rather than the mean in an educational setting. 

Suppose the local board of education is desirous of knowing the 
average compensation paid to the professional staff in their small 
district. The salaries are: 

1 Superintendent 520,000 
1 High School Principal SI 8,000 
I Elementary Principal S 1 5,000 
7 Teachers S 8,500 

6 Teachers r« S 7,500 

^ Teachers S 7,000 

If the mean salary ($9,394.75) were reported to the school board 
rather than the mid-score (median) value of S8,500.00, the reported 
value would have been inflated by S894.75, Note that the median 
score is not influenced by extreme values. If the superintendent's 
salary were increased to S29,500, the mean score would increase 
S500. while the median score would remain the same. 

The mode is the most frequent" score that occurs in a given distri- 
bution. In a truly normal di^ribution the mode, median and mean 
occur at the same point on the scale. Sometimes, more than one 
modal point occur in a distribution as in Figure 2. 



-39- 



I IGl Ri: 2 
A Hi-M<»dal UistributHm 




In I'ijiiirc 2 modal points occur above 5 and ^> on the horizontal 
scale. As one observes multimodal data, he tends to react that what 
was nioasup'djnd displayed was done so on a population having more 
than one characteristic. Tor example, il" Figure 2 represented data on 
the numbers ol' people having dilTerent shoe sizes, the bimodal distri- 
bution resulted I'rom combining both men and women in the sample 
displayed. The first modal point encountered (5) most probably is 
the central tendency lor women's foot sizes: whereas, the second 
modal encounter at 9 represents the central tendency for men's 
loot sizes. 

In summary, there are three statistical measures used to identify 
the central tendency of a distribution of scores. These are the mean 
(arithmetic average), the median (middle score) and the mode (the 
most Ireciuent score. 

The normal probability curve also can be divided into a series of 
cijual units along the horizontal axis which partitions the area of the 
normal curve into a series of standard units called the standard de- 
viation. The standard deviation is a statistical measure such that one 
standard deviation unit on each side of the mean includes approxi- 
mately two-thirds of the number of cases in a given distribution. 
Standard deviation scores are represented by a scale of z scores which 
has a mean of zero (0) and a standard deviation value of one (1 ). This 
relationshij^ is described in Figure 3. 



ERLC 



-40- 



Relationships among DilTcrcnt Typo of 
Test Scores and the Ntfrnial Cune 



0,13% 



7A4% 



13.59% 



-3 



-2 



34.1 3X 



34.13% 



13.59% 



2.14% 



-1 



0 +1 
z score 



+ 2 



+ 3 



0.13% 
1^ 



5 


1 

10 


15 


1 1 
20 25 

Mv> 1 oCOic 

i t 


30 


35 


200 

« 


300 

f 


400 

I 


500 600 
CEE6 icore 


700 


800 


55 

f 


70 

J 


85 


100 115 
Deviation IQ 
(SD«15) 

< • 


130 

» 


145 


20 


30 

• 


40 

• 


50 60 
r score 

1 1 1 t 1 


70 


80 


1 

1 




2 3 

1 i 


4 5 6 7 8 
Sfonine 

1 1 I 1 t t f 




9 

1 



1 10 20 30 4050 60 70 80 90 99 

Percentile 

In Figure 3 note that the one standard deviation on each side of 
the curve includes 68 of the area under the curve (or the number 
of people contained in a distribution of scores). Bet\vex.n +1 and +2 
and -I and -2 another 13.59// is contained respectively. As can be 
observed the z score partitions the horizontal axis under the normal 
curve into a convenient number of equal intervals, and provides the 
statistician with a clear picture of whefe a given student score places 
that interval along the scale. For example, if a student's achievement 
on a test is equivalent to a z score of +2, the evaluator knows that 
this attainment is two standard deviations more than the mean achieve- 
ment for all students who took the test. It also places him within the 



ERLC 



-41 . 



top y/f of those students. Reporting test results to students and 
parents in terms of / scores can be somewhat disconcerting. Imagine 
the chagrin for the parents and the child whose achievement was 
exactJy on the mean. His a score would be zero. Needless to say, the 
teacher would have to expend considerable time with both the parents 
and the cliild to assure that they understood that not only was the 
child's achievement greater than zero, but that his performance was 
perfectly acceptable against an average criterion. 

While a standard deviation unit scale (often called standard scores) 
provides valuable statistical information, for reporting purposes other 
standard scales have been produced. McCall, -for example, developed 
the T score scale which has a mean of 50 and each standard deviation 
is assigned a value of 10. The American College Testing instrument 
has a mean of 20 and a standard deviation of 5. The College Entrance 
Examination Board test battery has a standard score scale which has 
500 for a mean and 100 for a standard deviation. The Wechsler In- 
telligence Scale has a mean of 100 and a standard deviation of 15. 
It becomes readily apparent that any specialized standard score can 
he created by arbitrarily assigning values for the mean and the stan- 
dard deviation. 

The stanine scale divides the 100% area under the normal curve 
into nine divisions. Stanine scores have a range from !1 to 9 with an 
average of 5. These scores have been valuable in predicting success in 
pilot training and specific college courses. 

The percentile scale represents a cumulative recording of the area 
under the normal curve. Percentile scores present a relative standing 
for any individual vis-a-vis his standing in the group of people who 
also have taken the test. If a student's achievement on a given test 
places him at the 70%-ile, this means that his achievement was the 
same or better than 70% of the students on whom the test was 
normed. From mere observation the reader can ascertain that the 
percentile scale is tightly packed around the middle and widely ex- 
tended in terms of numerical intervals on either extreme. Since per- 
centile scores cannot achieve the requirements of interval data, they 
cannot be added, subtracted, multiplied or divided for use in statisti- 
cal manipulation. Percentiles, therefore, are widely used in reporting 
achievement results to parents, because the concept of percentiles is 
closely related to the notion of percentage, a concept with which 
parents are familiar. 

Norms 

Wlien measuring instruments are developed, they are usually field 
tested on a large sample of students. The sampling procedure em- 



id 

ERLC 



-42- 



ployed by most testing companies is usiudly both exhaustive and com- 
prehensive. Students are selected For inclusion in either a national or 
regional sample on either a random or a stratified basis. A random 
sample is one where every person eligible to be selected has an equal 
opportunity to be included in the sample. Random sampling repre- 
sents an underlying mathematical assumption for parametric statistics 
such as mean comparisons, analysis of variance, etc. A stratified sam- 
ple is employed if the sampler is interested in obtaining students whu 
have differing desirable characteristics in some proportion. It is quite 
common to use socio-economic status, sex, ethnic background, etc. 
as the basis of some proportionate stratification. Within each strata, 
subjects are randomly selected for inclusion in the sample. 

After the students have been selected for inclusion in the sample, 
and the test has been administered to them, mean and standard devi- 
ation values are determined. From the mean and standard deviation 
statistics, specialized standard scores are developed (e.g., CEEB, ACT, 
T score and z score) and for each standard score, a percentile equiva- 
lent, grade equivalent or similar score is generated for norming pur- 
poses. These norms are developed on either a regional or national 
basis for the various strata employed. Then, when this test is ad- 
ministered in a local school district, each student's acliievement can 
be compared against these regional or national norms. 

It should be stated that as an artifact of statistics the use of norms 
results in decisions such that half the students measured score above 
and half score below the 50%-ile. In this statistical sense the percentile 
rank of a given school (or student) makes no comparison relative to 
the competency level in the subject matter that has been attained. 

Criterion-referenced measurement practices tend to report a given 
school or school district's level of competence against each of the 
educational objectives which the instrument was designed to measure. 
If the criterion of success is agreed before testing to be 70%, 75% or 
80%, a comparison of the average ease index for the items measuring 
a given objective provides the evaluator with the data required to 
ascertain if the achievement met or exceeded the criterion standards. 

Ranking students against national, regional or state norms may 
provide useful information for educational guidance and counseling 
for that student and, as such, could be a good thing. Realistically, 
there is little educational benefit to be derived from such normative 
comparisons for individual schools or school districts. School and 
school district data should be normed against criteria rather than 
normative standards. The educational effectiveness of a given school 
or school district ultimately rests in the educational competencies of 
its students. This student competency can most validly be assessed 
against the actual educational objectives in use in the school. Then, 

-43- 

ERLC 



and only then, can some judgment be rendered relative to the educa- 
tional quality, effectiveness, or the cost of efllciency of a given school 
district's educational program. 

A Point of Compromise 

Up to this point, the reader lias probably gained the inipression 
that norm-referenced measurement is employed to rate students against 
some national standing; whereas criterion-referenced testing purports 
to measure the student mastery of learning objectives contained in a 
given school district's educational program. Hiis impression is a valid 
one. A point should be made, however, that the comparison of data 
yielded from each measurement device need not necessarily result in 
jiW citficr-or type of measurement decision. Certainly, a given norm- 
referenced instrument does not contain items which comprehensively 
measure the content of a given district's English, math, reading or 
social studies programs, but chances are that it does measure a signif- 
icant portion of these respective i)rograms (else why did the faculty 
recommend the use of a given norm-referenced instrument?). The 
questions on tlie norm-referenced instrument could then be related 
to the school district's program objectives. When this task is com- 
pleted, some local objectives will have five or more questions, others 
one or two questions, and still other objectives will have no questions 
on the norm-referenced instrument. 

An analysis of these itemsand objectives will reveal objectives wliich 
have been partially measured and others that have not been measured 
at all. This analysis then becomes the basis for constructing a criter- 
ion-referenced instrument. At testing time, both instruments are 
administered. The usual norm data are available for use in the tradi- 
tional manner for each student. Similarly the student answers for 
both instruments are partitioned by local district educational objec- 
tive to ascertain whether the obtained ease indices meet the estab- 
lished criteria. Tliis dual approach would enable local educators to 
conduct their districtwide assessment and relate these data to the 
effectiveness of their local educational programs. Tliis compromise 
measurement procedure would allow the local district with the lowest 
possible cost for the criterion-referenced measurement and evaluation 
of their educational program. 

Many school districts who employ norm-referenced measurement 
also purport to have programs which individualize instruction. Cer- 
tain individualized programs such as Individually Prescribed Instruc- 
tion ( IP! ): Individually Guided Education (IGE); Project PLAN (West- 
inghouse Learning Corp.); etc.. each have the notion of student 
learning as their goals. Norm-referenced instruments can be em- 
ployed to determine whether a given student is working up to his 
potential througli comparing the actual achievement with some 

-44- 



measure of his expected aehievement. 

In the early 1960s the California Testing Company (now a division 
of McGraw-Hill) incorporated an intellectual status index into the 
California Acliievement Test Battery. The intellectual status index 
employed the mental age portion of the student's IQ to project how 
much progress a given child should make in a school year. The tech- 
nical conversion grade equivalent measiirei: for each IQ score are then 
made. The practice then was to compare actual achievement with ex- 
pected achievement and determine whether the student was an under- 
achiever, achiever, or overachiever on a given subtest. One problem 
experienced when using the intellectual status index was that this 
technique tended to overestimate a given student's predicted achieve- 
ment. 

In 1969, Mykebust developed a Learning Quotient which is pur- 
ported to be a more reliable estimate of expected performance. The 
learning quotient includes a more definitive and comprehensive in- 
dex of expectancy as it takes into account not only the mental age, 
but the chronological age representing physiological maturity, and 
the grade age representing an index of school experience as well. A 
learning quotient can be computed for each area of achievement (each 
subtest) and therefore provides individualistic information of a diag- 
nostic nature to help identify particular strengths and weaknesses 
for each child. 

Actual achievement (scores earned on achievement tests) is then 
related to this expectancy and multiplied by 1 00 resulting in a Learn- 
ing Quotient: 

Achievement Age x 100 = Learning Quotient 

Expectancy Age 

Learning quotients of 89 and below are interpreted to indicate a 
substantial discrepancy between actual and expected achievement 
indicating a student learning deficiency requiring special attention. 
Learning quotients of roughly 90 to 93 repiesert problem areas still 
requiring attention but not quite as severe as the 89 and below L.Q.s. 
Learning quotients of roughly 94 and above indicate that a child is 
achieving at a level commensurate with liis expectancy. A relative 
pattern of strengths and weaknesses becomes apparent as L.Q.S in 
different achievement areas are compared. 

The Foundation For Individualized Evaluation and Research, Inc. 
(PIER) a non-profit foundation of DeKalb, 111., provides low cost corn- 



id 

ERIC 



-45- 



putcri/cd evaluation for school districts employing individualized ap- 
proaches in their educational programs. FIHR has been validated by 
the Westlnghouse Learning Corporation to provide local evaluation 
of school districts utili/.ing Project PLAN. 

Whether the loail district employs a combination of norm and 
criterion^referenced data to assess their educational programs on a 
local basis, or embarks on a computerized approach to measure the 
individualized learning of its students, the results come out the same. 
Areas of program strengths and areas of improvement result. It re- 
mains then for the educational leaders in the district to evaluate these 
data, and initiate programs of change, where appropriate. 



'46- 



CHAPTER VI 

STATEWIDE EDUCATIONAL EVALUATION 
Some Technical Considerations 

Some states have legislation which requires some form of state 
testing program. Tlie obvious intent behind this legislation was tc 
gather educational information which could provide the basis for 
further educational legislat'on. 

Tlie expression statewide testing program has fallen into misuse 
of late because of the emotional furor produced by the more vocal 
elements of the public when the word testing is associated with a 
county or state program. To counteract this public negativism, edu- 
cators have tended to substitute the word assessment to describe 
their programs. 

You are now faced with the prospect of having or not having some 
kinds of programs which will describe student learning at fill grade 
levels in the state. Tliese programs should be designed so local district 
curricula can be examined and revised if found wanting. At the state 
level, you need to know what kinds of student needs exist, and how 
you might best apportion the state educational resources to remediate 
these needs. You will also have to finance these projects from existing 
state educational resources, and therefore insist that the program 
cost be minimal. What do you do? 

Tliis dilemma is similar to one experienced by the former Dean of 
the Faculty of the U.S. Air Force Academy. He was concerned that the 
all-military teaching faculty had too much military rank and were 
getting older. He was attempting to minimize the costs associated 
with each aidet's education. The present faculty age and rank aver- 
aged 34 and major. He wanted a staff composed of rated pilots with 
over five years flying experience (combat preferred), having Ph.D. 
degrees in the areas to be tauglit at the Academy, exemplary military 
records, and he wanted the professors to be 25 years old and First 
Lieutenants! The same problem exists for you as state school board 
members as you decide to establish a program whicli will provide you 
with data reflecting the amount and quality of public education in 
your state. The funds you expend for this program will diminish the 
state educational resources available for distribution in support of 
needed or innovative educational programs at the local level. 

Now, what should you call such a program? You could call it a 
statewide testing program ( STPJ, but the expression testing has a strong 
negative connotation. Some states liave elected to call such a program a 
Statewide educational Assessment D-ogram (SliAPJ.The expression as- 
sessment:, being synonymous with measurement, does not reflect the 




-47- 



true pur|x>sc from such a program. Assessment, especially as used by 
National Assessment, merely describes the general conditions of 
learning without making comments relative to student learning in 
District X or District Y. For this reason, the expression assessment 
is an inappropriate rubric for ihe program. 

A Statewide Fiducational Uvalualion (Sr.l:> program perhaps comes 
closer to describing what is desired than cither of the other two sug- 
gested titles. (Note that the letter P was not included in the acronym, 
since sec meaning «f perceive .... or examine comes closer to the 
intent of the project, rather ihdn SI: l:P meaning a sp(Pt\\1iere%\ater or 
petroleum (mzes <mt slowly and gathers in a po(d (Webster's New 
Collegiate Dictionary). Literal elegance and accuracy aside, a State- 
wide I:ducativ^.5ial Evaluation program certainly represents what is 
desired. Tlie program should measuie or assess student learning by 
grade level. To acc.jTiiplish this, some mix of norm and criterion- 
referenced instruments should be employed or developed. The results 
should then be compared to other state, regional or national data, 
and judgments relative to the qiuntity and quahty of student learning 
should be undertaken (evaluation >. Finally a summary of these find- 
ings must be developed so that educational leaders, teachers, legisla* 
tors, parents, students and the public-at-large can read and under- 
stand them. 

Frank VVomer's monograph Developing A Large Scale Assessment 
Pnt^am describes in detail a systematic approach to develop and 
implement an assessment program. By experience and rigorous statis- 
tical training with National Assessment, Dr. Werner's paradigm very 
closely represents that model. 

Tlie procedures which follow represent a less rigorous but also a 
Jess costly alternative to developing, implementing and disseminating 
the results of a Statewide Educational Evaluation (SEE) program. 

Developing the Program: The SF'E program, to be a successful venture, 
must have the cooperative participation of legislators, state board of 
education members, the state department of education, educational 
leaders in the statv*, teachers, parents, students and the public-at-Iarge. 
The planning stage is crucial, but it also muiit be managed within a 
reasonable (s;^y 6 months) period of time. It has become an American 
educational practice to place possibly embarrassing programs in an 
indefinite tenure by forming a planning committee, lliis technique 
soothes the emotions of the program zealots while assuring that no 
immediate action will take place. After the committee makes its re- 
port, the recommendations are then retained in the local board of 
education's minutes thereby preserving its perpetual obscurity. 
Such a planning mode is not envisioned for the SEE project. De- 



-48- 



veloping the SEE program will involve at least six major steps. These 
are (1) appointing a SEE Project advisory' committee to the state 
board of education; (2) appointing an interim SEE Project director; 
(3) determining the objectives to be assessed by grade level and aca- 
demic disci pUne; (4 1 developing a measurement ^nd sampling strategy; 
(S) developing the instruments; and (6) pilot testing the instruments 
for validity and reliability. 

SEE Project Advisory Committee: The slate board of education should 
appoint a SEE Project advisory committee composed of representa- 
tives of state board of education, legislature, state educational pro- 
fessional societies, educators, the stat<:.' department of education, 
business and commerce, labor, the trade*, and students. The advisory 
committee must receive a specific charge from the state board of edu- 
cation which outlines: its duties; tenure (no more than 6 months); its 
role in coordination with the SEE Project director and his staff:and. 
the interval and requirements of the reports they should make to the 
state board of education. Meetings of the advisory committee will 
most likely progress on a decreasing frequency basis ranging from 
weekly to monthly during the tenure of its existence. The advisory 
committee approach allows all interested points of view to impact 
upon the embryonic program when modifications are relatively easy. 
The advisor)^ committee should have a definite budget under which 
they carry out their activities. At its first meeting, the members will 
elect a chairman who will provide the leadersliip, schedule the fre- 
quency of meetings, and provide liaison to both the SEE Project 
director and the state board of education* The advisory committee's 
final report to the state board of education will summarize the acti- 
vities, highlights, problems encountered, and recommendations. 

One obvious recommendation could be that the tenure of the ad- 
visory committee should be extended for some duration. The imple- 
mciitation of tlial. and other recommendations, however, then be- 
comes the prerogative of the state board of education. Advisory com- 
mittees will operate differently from state to state depending upon 
factors such as geographical size of the state, the funds avaibMe for 
travel and per diem„the need for and number of sub-committees whidi 
will be formed, etc. 

Appointing an Interim SEE Project Director: Concomitant with the 
appointment of the SEE Project Advisory Committee, an interim SEE 
Project Director should be appointed. The interim SEE Project Di- 
rector should be appointed for a two-year term and should possess an 
educational background rich in measurement, evaluation, and statis- 
tics. The interim SEE project director should be assigned to the state 



ERLC 



-49- 



department of education's staff so he reports directly to ihe chief 
state school officer. The interim SEI: Project Director will manage all 
aspects of the SIZE Project. He will select a staff, coordinate activities 
with other concerned agencies, supervise the sampling, instrunKnt 
development, data collection and analysis, and the dissemination and 
follow up of the first final report. The director will be required to 
solve a myriad of problems in executing his term of office. These 
problems wi!I be highliglited or eased depending on the size of his 
budget, the comprehensiveness of the sample, the use of norm-refer- 
enced or criterion-referenced instruments or whether he must super- 
vise the development of a unique instrument, etc. After the first SEE 
report has been published and diiiscminated, both the state board 
of education and the interim director will probably want to take a 
loitg look at each other before the decision to support a permanent 
director is made. 

The Educational Programs to be Measured: Tliis particular pbnning 
component represents the key around which everything else must 
inevitably evolve. It is also this event which most probably would 
liave the greatest variability from state to state. Tlie notion that 
education is a state function is a widely recognized truism across the 
country. Some states reserve this prerogative and administer public 
education in a regulatory manner through the state department of 
education. Other states have chosen to have the local school district 
establish their own educational programs, and have enlisted the state 
department of education personnel to function in a service role to local 
educators. Still other states employ some combination of these two 
extremes in administering the educational programs in their state. 

The statement of common state go;?ls and objectives for education 
will vary depending upon the leadershiv* style employed at the state 
level. Regardless of the leadership style, there would be some com- 
mon as well as different educational programs wliich exist at each of 
the K-1 2 grade levels. These common elements are hypothetically 
displayed in Figure 4. 

The hypothetical information on common educational programs 
presented in Figure 4 is based upon no data, but experience that sug- 
gests in grades K-3 approximately S07c of the in-school time is devoted 
to student learning of basic skills (reading, language arts, and math). 
During the upper elementary, junior high, and hi^ school years the 
number of common objectives from one school district to another 
would be more likely to decrease. Tlie SEE Program could only 
validly measure those objectives commonly shared throughout the 



ERLC 



-50- 



i ic;t KF 4 

The iVrcenUge of Common |-:ducatKHial Program Objective^i 




0 



K 



2 



3 




4 



5 



6 



7 



8 



9 



10 



11 



12 



State. Local educational evaluation is then required to bridge the gap 
between the state measuremenl of common objectives and the total 
number of objectives in their local programs. 

It would be estimated that teams of educators representative of the 
K-I 2 academic disciplines would need to be convened to screen out 
the multitude of local objectives and to rephrase the common ob- 
jectives so they can be classified (according to Bloom), and measured 
later by questions on the statewide evaliution. Tliis list of measurable 
objectives then should be coordinated with the state department of 
education, and local school district curriculum specialists before be- 
ing presented to the state board of education for endorsement. 

During this period, decisions regarding the number of grade levels 
and academic disciplines to be measured must be made. Tlic funding 
levels available for the SEE Program will in all likelihood cause a com- 
promise from an ideal distribution of grade levels and subject matter 
$0 be measured to a more real level of effort. 

Develop a Measurement and Sampling Strategy. The 1973 Education- 
al Testing Service report State Educational Assessment Program indi- 
cates testing programs in existence in the 50 states and territories. 
These programs vary depending on the sources of funding (Titles I and 
Hi ESEA, staie statute for reading, state department of education 
programs, eto VVonier (1973) suggests the techniques of matrix 
sampling employed by National Assessment as an efficient technique. 
Matrix sampling is like selecting a scries separate, mm-overlapping 
samples oj students, with each sample taking a different subset of the 
total number of items to be administered (Womer, 1973, p. 70). 



ERLC 



-51 - 



Tliis tcchnuiuo lias much to recommend it. Instead of asking each 
student to devote 3 hours to take the SKU, three students could take 
non-overlapping items in one hour. While this technique increases the 
total number of students to be involved in the project, it does so 
with minimum interruption of the school program. 

Some states may choose to measure every student in specific grades 
and gather information about each chOd three or more times during 
the 13 years the child spends in the system. Figure 5 describes a 
variable three-year cycle to evaluate students at each grade level once 
every three years. 



Year I 
1 

2_ 

4 
5 

7 
8 

CT 

10 

11 



llirec Year hvaluati<in l^^n 



Year It 
1 

3 
4 

6 
7 

CP 
9 

10 

or 

12 



Year III 

CT; 

2 
3 

<£ 

5 

6 

<D 

8 
9 

11 
12 



Tlie three-year evaluation plan described in Figure 5 allows two 
intervening years to revise the educational program for a given grade 
level before that grade is measured again. This three-year evaluation 
plan would be more feasible at the local level (providing they used 
norm-referenced tests) than at the state level. If this three-year cycle 
were employed to develop four unique instruments each year, the 
cost of such an undertaking would be prohibitive. 

Tlie decisions to go with sampling versus every pupil techniques, 
and whether the state elects u> measure students at one or more grade 
levels need to be made early in the planning state. Variation in the 
measurement and sampling strategy later in the enterprise cause not 
only considerable adjustments to be made, but also additional funds 
to detray these changes. 



ERLC 



-52- 



Develop the Instruments: Test ilcvelopnient is a very tedious, time 
consuming activity. Major testing companies employ large numbers 
of test item writers on their staff, as well as college and university 
faculty consultants. The interim director of SEE will have to devote 
considerable care and attention to selecting appropriate staff mem- 
bers for the purpose. He may decide to contract this portion of the 
program to professional testing companies whose statf has this exper- 
tise, but this approach is expensive. 

Should he decide to attempt to generate his measuring instruments 
within the expertise of state, local and university agencies, he lias a 
few procedural techniques which will tend to make the job easier. 
You may recall in the section dealing with learning. The Taxonomy 
of Educational Objectives (Bloom and Krathwohl editions), was used 
to describe the levels of learning in the cognitive and affective do- 
mains. Using this taxonomy, the objectives to be measured can be 
classified relative to the level of learning. The taxonomy also contains 
sample test item formats which the authors purport are designed to 
measure a given objective at the appropriate level. Thus, the intelli- 
gent use of the taxonomy to develop questions lo measure objectives 
at a given level of learning results in an additional criterion standard 
with which you can provide content validity for the individual 
test items. 

Womer cites the f4ict that National Assessjnent willingly m^kes 
test items available to educators at no cost. The items, the objectives 
they were designed to measure, and the ease indices (proportion of 
students who scored correctly on the item) are made available by 
National Assessment. It must be assumed that for the many assess- 
ment instruments developed by National Assessment, a large number 
of test items (with demonstrated item analysis) would be appropriate 
for the SEE instrument. The need then exists to generate test items 
which would measure all of the remaining objectives at the appropri- 
ate level of learning. 

To accomplish the writing of the test items, it is recommended that 
a team of teachers be identified for a given academic discipline and 
grade level. It is further recommended that an experienced test writer 
or educational psychologist (with a background in psychometrics) 
also be assigned to the team. It is his task to: (1 ) explain the classifi- 
cation and test item rationale of the taxonomy to the team; (2) ad- 
vise them on the general guidelines for effective writing of test items 
(no trick questions, no double negatives, don't ask a series of quest- 
ions where if the student can get one right, he can through elimina- 
tion answer the others, don't use the words j/vvjV5 or never in true- 
false questions because they invariably point to a false response, etc.); 



ERIC 



-53- 



and,<3i serve as an evaluation consultant to the team. The team mem- 
bers after the above orientation will decide the specific kinds and 
types of questions to ask as well as being responsible to assure that 
the specific vocabulary used in the question i^ compatible to the 
reading level of the typical student in the grade measured. Affective 
as well as cognitive questions should be contained in the SliU instru- 
ment. 

A separate team would be required to generate questions for each 
academic discipline at each grade level. The items geacrated by a given 
team should be reviewed ( 1) by other qualified teachers for that grade 
level and subjects, and (2) by the project stalT, 

The Pilot Study: When the Shli instrument is developed, it should be 
pilot tested on a small group to assure that the items are valid and 
that the subtests are reliable. The size of the pilot sample should be a 
few hundred students who are not to be included in the larger state- 
wide sample. 

When the results are available, the test should undergo a thorough 
item analysis. The item analysis will reveal the ease index for each 
item, the specific responses which are not being selected and the esti- 
mate of reliability. Traditional testing protocols would insist that 
each subtest start with easy items and that the subtest get more diffi- 
cult as the student progresses. If the plan were to follow a norm- 
referenced approach, the items could simply be ordered in terms of 
increasing difficulty as a result of the item analysis. National Assess- 
merK has found that the random presentation of items by difficulty 
can often result in student reinforcement because as he goes along he 
may encounter aii easier question after a few hard ones, and the re- 
ward of the easy question challenges him to continue, tach stale must 
decide for itself how it wishes to handle the ordering of items on its 



Implementing the Program: Once the sample and the measuring instru- 
ment have been developed and refined the interim director of SEE 
can breathe just a little easier. The decision as to when the testing 
would take place should already have been anticipated, discussed, 
and finalized. Some states employ fall testing so the results can be 
analyzed and reported back to the districts while the children are en- 
rolled in the same academic year. This practice lias much to recom- 
mend it. If the local educators find that some students in their district 
experienced difficulty with certain concepts, these concepts could be 
retaught before the cliild is promoted to the next higlier grade. 

Basically there are two crucial components to the implementation 
of the SEE program. These are: distribute the instruments; and col- 
lect, score and analyze the data. 



SEE. 



-54- 




LNstributionor the Instruments: As part of the sainpHng procedure, 
the SHU project stalY should have informed the school districts select- 
ed for inclusion in the project of the number of students participating 
and the projected testing dates. The Unai instruments and answer 
sheets should be packaged and mailed to the school districts so they 
arrive in the local district at least one week prior to the desired test- 
ing date. 

Each district should have appointed a Slil: project teacher (or ad- 
ministrator) who would handle the logistics of explaining the testing 
instructions to local district test monitors and assuring that each 
monitor checked both lest booklets and answer sheets at the close 
of the testing session. The local SliL project teacher would then mail 
the answer sheets and lest booklets (under separate cover) to the 
state SEE project staff. 

Data Analysis: When the answer sheets arrive at the SEE project 
office, they are checked for stray marks, multiple answers, etc. and 
prepared for some form of machine scoring. The faster procedure 
would be to have the answer sheets designed so they could be run 
through optical scanning input devices for computers. Of course, 
alternative scoring procedures could be employed, but they tend to 
be more time consuming. The data on each answer sheet is then pre- 
pared for computer processing: and. when all the answer sheets are 
available, the first of a series of computer runs is undertaken. 

The first run through the computer should merely tabulate the 
iVequency count for each item on the test. With these data, the re- 
liability of the test can be readily ascertained. Comparison of the pilot- 
test data and the actual sample then can be made and differences 
noted. 

Further runs might find the total sample partitioned on some 
variable, i.e., sex, ethnic background, socio-economic status, district 
size, college bound youngsters, vocational education students, etc. 
The item analyses generated from these additional runs niiglit reveal 
areas of educational need, program strengths, etc. 

Specific computer runs to group the item data by educational ob- 
jective would also be undertaken. The print-outs then would be 
analyzed and compared between the total group and each of the 
variables listed above. 

Additional statistical analysis such as factor analysis to see whether 
the instrument actually measured just what you thought it was meas- 
uring or whether additional factors emerge from the analysis should 
also be undertaken. 

Norms then would be developed on a total state, district size and/ 
or geographical region bases so local educators can appraise their re- 
sults in context with other similar districts. 

-55- 

ERLC 



'I hrouiihout the analyses, the anonymity of each school and dist- 
rict must he scrupulously adhered to. 

When all the analyses have been completed, percentiles should be 
derived lor norm-ret'erenced instruments as well as learning quotients 
if they are desired. For norm-referenced measurement the average 
ease index for each object ive must he obtained, and a decision of 
whether the students met the specific criterion must be established. 
These data should then be displayed in summary fashion for use in 
the SI: I: report. 



Dissenimntion: 

The reports generated by the SI:1: project must be written in dif- 
ferent forms so the targeted audience can read and understand them. 
A techniail report replete with statistical comparisons can be devel- 
oped for educational researchers and key educational administrators. 
An 'M'jiglish" version of the report should be developed so parents, 
students and the public-at-large can read and react to it. 

The outline of each version of the SHI£ program report should 
follow ilic taxonomy for educational decision-makers described in 
the introduction of this handbook. The SEI£ evaluation data should 
provide partial answers to: 

/. What should our school graduates: know; be able to do: be- 
lieve? What do they know? 
2. To what extent: should the schools ameliorate class differ- 
ences: emphasize individual excellence: insist on common min- 
imum standards of performance: treat the exceptional stu- 
dent exceptionally ? 
.>'. What do our citizens believe: should he the goals of education: 
are the essential priorities of education: about the value of 
student preferences and needs? 
4. What should we do about the contradictions affecting educa- 
tion: between freedom, progress and necessity: between goals 
of equality and goals of excellence: between what we believe 
and what wc are doing: and between societal and individual 
needs and preferences? 
If the SHH Program reports respond to this taxonomy of educa- 
tional (juestions, the achievements, the successes, the areas for im- 
provement, and the follow up activities come sharply and quickly 
into focus. The report should provide the locus for innovation and 
improvement in public education and allow you, the state board of 
education member, to receive hard data so you can make more valid 
educational decisions for the future. 



Disseminating the SEE Project Reports 



- 56 - 




CHAPTER VII 



SOME FINAL CONSIDERATIONS 



This handbook would not be complete without comment on some 
global considerations important to state board of education policy- 
making with respect to statewide educational evaluation. These com- 
ments may be viewed also as thinly disguised advice. 

The state board of education establishes policy; the chief state 
school officer and the state education agency staff execute that poli- 
cy. This basic and oft-repeated tenet is doubly important when it 
comes to assessment. State board members should find out all they 
want to know about the programs. Tliey should set the policy limits 
and general timetable, and then delegate policy execution to the staff. 
Tlie state board of education should call for frequent progress re- 
ports and expect results, but state board members need not dabble 
in the day-tonlay operational details. Attention to the broad overall 
objectives will be sufficient. This will not only save a lot of board 
time, but the staff will respect the confidence wliich has been placed 
in them. 

It is not essential tlut a state have a statewide educational assess- 
ment program in order to have good schools - but an assessment 
program may help in shewing ways to improve those good schools. 
For an assessment program is only one major school management 
technique having potential for effectiveness, it is not a panacea. A 
poorly conceived, underfinanced, and badly executed assessment 
program is worse than no assessment program at all. Rather than 
have a poor program, a state should perhaps look for current activi- 
ties and programs which can become components of an assessment 
program and build from there. For example, almost every state edu- 
cation agency collects much information from local districts that is 
never used or analyzed at the state level: a bit of time and money 
spent sifting through this for indications of school performance 
might return bigger dividends in decision making information than 
time and money spent in starting a new assessment program. 

And speaking of decision-ymking, let's take a frank look at what it 
really means. Decision making information might mean to the edu- 
cator those facts which substantiate the need for additional state 
funds being provided to the schools. Decision making information 
to the legislator might mean those facts that show high-cost-per-stu- 
dent districts have students whose performance is less on the average 
than districts which have low per pupil costs. An assessment program 
whose objectives are set to confirm predesignated opinions like these 
may be destined to failure. 




-57- 



IVt^poiicnls dl" stiitcwidc L'tluciitional cvalualion apparciilly helicve 
lluil addilioiial hunt data. pn)vidcd llirouiili scionlilic iiKMliods and 
prcsciilcd in an obicclivc iiiaiiiicr, will suddenly clevalc dccision- 
inakiiig lo a liigli-lcwL coniplclcly ralioiial aciivily. It may help in 
this regard, hut leTs lace il: deeisioii-inakiiig is siill aecc^jnplislied by 
people, and people will still make deeisions on how they feel about 
things, riie best assessment inlorniation in the world will not change 
this. 

.\ statewide educational evaluation program which first concen- 
trates on information lor decision making at the local level probably 
slands a better chance of initial acceptance and long-term success. 
Ihe cht)ice ol tirst serving local districts administrators and boards 

will indicate state board recognition that the most important de- 
cisions alTecting a child's education are made at the local level. Those 
stales which have given priority in recent years to the development of 
management information systems at the local level will find this in- 
vestment paymg off since assessment procedures can merely become 
Mudil'on to current programs. 

It is important here to review the position on assessment taken by 
chief slate scliool oflicers in M^7I. Tlie policy statement was as fol- 
lows: 

I'hc Council of Chief State School Officers recognizes the neces- 
sity for the assessment of education at the state and local level 
and urges the member states to support the development of as- 
sessnient capability within the states. 

The Council reaffirnis its commitment to a fuller understanding 
of the status and needs of American public education through 
edm ational assessment at the national level as conducted by the 
Education Conunission of the States. 

(Slate & federal Relationship in fducation) 
Perhaps the very first thing to do when considering an assessment 
program is to deline what it would mean in a particular state. Also, 
define the related terms, like needs assessment, accouniability. state 
testing program, learner needs assessment, student performance as- 
sessment, accreditation, district assessment, and student assessment. 
Development of a glossary of terms containing these and other words 
will contribute to understanding as discussions proceed to: Shall we 
assess Why assess? and What sliall be assessed? 

Is educational assessment here to stay? Certainly, but to put this 
answer in perspective, remember that ;( 1 ) state boards of education 
and stale education agencies have been charged with reporting the 
status of public education since their establishment : (2 ) the elements 
of an assessment program have been around for quite a wliile: it's the 
packaging of these elements to provide certain kinds of information 



ERLC 



- 5.S - 



that's new; and, (3) some now-sounding educational management tech- 
nique will be along in a year or so and overshadow the current interest 
in assessment. 

So examine assessment, use the idea and measurement techniques 
in those situations where it is warranted, and don't expect it to be 
the ultimate anv^er to anything. The admonition here to those con- 
sidering asstrssment is to suggest one ultimate question to be answered: 
Will what we are doing help kids? 

In this little volume assessment has been presented synonymously 
with measurement. Statewide assessment has its advocates, its ad- 
vantages and its dangers. Taken by itself assessment without evalua- 
tion is sterile, and if it doesn't lead to good decisions and workable 
decisions, it may be anything from useless to evil. Assessment can 
not intelligently be used to judge the quality of individual teachers. 
Tlie factors of learning will not permit such naive simplicity. Some 
will nevertheless advocate such nonsense. 

A greater evil by far would be using assessment data to justify ar- 
bitrary standards for quality education which will early in school 
careers classify and categorize opportunities for individuals. The 
schools of the United States flowed from our democratic orientation 
and basic disdain for class structure. We must not succumb to the 
temptation to use education to stratify our people. We mustn't 
abandon the optimism and hope, perhaps the naivete, of a system 
open to all people for all their lives including the democratic right 
to fail in programs too difficult. 

Lastly let us approach statewide evaluation with caution. Let us 
not make glorious promises to our citizens which will fall short and 
disappoint. We don't want statewide evaluation to become another 
dead fad like progressive education, teaching machines, progrij,mmed 
learning, flexible scheduling and other gimmicks oversold to the pub- 
lic by zealous and gullible educators. 



ERLC 



-59- 



REFERENCES 



Alinumn, J. Stanley The First Results in Compact, Denver, Colorado 
llducalion Commission of the States - VoL d. No. I, February 72 
(pp. 13-17) 

Anastasi, Anne Psvcho/o^^ical Testing - New York - The MacmilLui 
Co. - h)(>I 

Bloom, Benjamin S. ( Ld J Taxomnuy of /Uliu atiofnil Objectives I/and' 
hook I: Cognitive Domain - New York - David McKay Co. Inc, 1^)56 
ipp. 201-207) 

State Tducational Assessment Progfanis - Princeton, N.J. 

Educational Testing Service - h'7l 

State lulucational Assessment Programs - 1 ^)73 Revision - 

Princeton, N.J. l:ducational Testing Service - 1 973 

Bruno. Nancy L.. Paul B, Campbell, and William H. Schabackcr State- 
wide Assessment: Methods and Concerns - Princeton, N.J.: Center 
tor Statewide hducational Assessment. Education Testing Service - 
1972 (25 pp.) 

Compact - liducalion Commission of the States. Vol, o. No. 1, Feb. 

1972 (issue on National Assessment: Measuring American Education) 
Ciardner, John William lixeellence. Can We lie lu/ual and Excellent 

7*00.' - New York, Harper- I Oo I (171 pp,) 
Jencks, Christoper//K7///i///n;/l Reassessment of the Ljfect of /'amily 

and Schooling in America - New York' B'd^ic Books ' 1972 (399 pp.) 
Krathwohl, David R, (Ed.) Taxonf>my of Educational Objectives 

Handbook II: Affective Donuiin - New York - David McKay Co. 

Inc. - 1905 (pp, 176-185) 
Implications for Edueatitm of Prospective Changes in Society - Denver: 

Designing Education for the Future - 1967 (p. 195) 
.Myklebast, H,E. (Ed.) Progress in Learning Disabilities - Vol. 1 - New 

York - Greene Stratton - 1969 
Nolte, M. C hester. 1// hitroduction to School Administration: Selected 

Readings - New York - The Macmillan Company - 1966 (p. 74) 
Olson. Arthur R. and Edward H. Lyell Educational Evaluation and 

Assessment System for You - Denver, Colo. Department of Educa- 
tion - 1972 (48 pp. plus appendices) 
Phi Delta Kappan, Vol Vll, No. 4 - Dec. 1970 (8 articles on ac- 

coimtabilitv ) 
Phi Delta Kappan. Vol. LIV, No. I - Sept. 1972 

The I'Uture as Metaphi)r: lulucational Decision-Making - Seminar Re- 
port: Regional Interstate Project Program - Denver. Colo. Dept. of 
Education - 1972 (67 pp.) 



ERLC 



-60- 



Woiiicr. Frank B. Ih vclofiinf! a Lurge Scale Assessment frosruni - 
Denver. Colo. - Cooperative Aeeouiitability Project Monograph 
-1973 



-61 




APPENDIX I 
DIRECTORY OF KEY SEA ASSESSMENT 
AND EVALUATION PERSONNEL 



ALABAMA 

Lcdford L. Boone, Coordinator 
Planning & Evaluation 
State Department of Education 
State Office Building 
Montgomery, Alabama 36104 

ALASKA 

Ernest E. Policy, Coordinator 
Planning & Research 
State Department of Educ;ition 
Pouch F, Juneau, Alaska 99801 

AMERICAN SAMOA 

Tyman L. Stephens 

Asst. Director Business Services 

Department of Education 

Pago Pago, American Samoa 96799 

ARIZONA 

William R. Raymond, Director 
Planning and Evaluation 
1535 W. Jefferson 
Phoenix, Arizona 85007 

ARKANSAS 

Dr. Sherman Peterson, Associate 
Director, Planning & Evaluation 
Department of Education 
Education Building 
Little Rock, Arkansas 72201 

CALIFORNIA 

William Bronson 
Office of Program Evaluation 
State Department of Education 
721 Capitol Mall 
Sacramento, California 95814 

CANAL ZONE 
Dr. C. L. Latimer 

Deputy Superintendent of Schools 
Box 1 23 

Balboa, Canal Zone 



COLORADO 

J. D. Henncs, Consultant 
Planning and Evaluation 
Department of Education 
Denver, Colorado 80203 

coNNEcncirr 

R. Douglas Dopp 
Special Projects Planning 
Room 310 State Office Building 
Connecticut State Department 
of Education 

Hartford, Connecticut 061 15 
DELAWARE 
Robert A. Biglow 
Planning, Research & Evaluation 
Division 

State Dept. of Public Instruction 
Dover, Delaware 19901 

DISTRICT OF COLUMBIA 

Robert B. Farr, Director 
Pupil Appraisal 
415 1 2th Street, N.W. 
Washington, D.C. 

FLORIDA 

James C. Impara 
Administrative Education Acct. 
Department of Education 
Tallahassee, Rorida 

GEORGIA 

Sarah H. Moore 
Coordinator for Evaluation 
State Office Building 
Room 3 1 5 Annex 
Atlanta, Georgia 30303 

GUAM 

James B. Branch 
Administrative Head 
Planning & Evaluation Unit 
Department of Education 
P.O. Box DE 
Agana,Guam 96910 



HAWAII 

Ronald L. Johnson, AJministrator 
Evaluation Section - OlS 
Department of Education 
1 270 Ouecn Emma Street 
llonclulu. Hawaii 96813 

IDAHO 

Wayne A. Phillips, Prog. Admin. 
Planning, Development & Info. 
State Department of Education 
ten B. Jordan Building 
Boise. Idaho 83720 

ILLINOIS 

Thomas Springer, Director 
Statewide Assessment & 

Evaluation Section 
Office of Supt. of I\ib. Instruction 
325 South 5th Street 
Springfield, Illinois 

INDIANA 

Ivan Wagner, Director 
Planning and Evaluation 
State Office Building 
Room 803 
Indianapolis, Indiana 

IOWA 

Max Morrison. Director 
Planning. Res. & Evaluation 
Dept. of Public Instruction 
Grimes Office Building 
I3es Moines, Iowa 50319 

KANSAS 

Dr. Larry Casto. Asst. Comm. 
Division of Development 
State Department of Education 
120 E. Hampden Street 
Topeka. Kansas 6661 2 

KENTUCKY 
IX>nald S. Van I'leet 
Director of livaluation 
State IX*partment of Education 
l-rankfort, Kentucky 40601 



LOUISIANA 

Katherine P. Finley, Director 

Planning and Evaluation 

Deputy Assoc, Supt, 

of School Programs 

P.O. Box 440(A 

State Department of Education 

Baton Rouge, Louisiana 70804 

MAINE 

Dr. Horace P. Maxcy, Jr. 
Planner 

Education Building 
Augusta, Maine 04330 

MARYLAND 

Richard K. McKay 
Asst. State Superintendent 
State Dept. of Education 
P.O. Box«7I7 

Friendship International Airport 
Baltimore, Maryland 21 240 

MASSACHUSETTS 
James F. Baker 
Associate Commissioner 
182 Tremont Street 
Boston, Massachusetts 021 1 1 

MICHIGAN 

Robert J. Huyser 

Supervisor, Assessment Program 

Box 420 

Lansing, Michigan 48902 
MINNESOTA 

John W. Adams, Director 
State Education Assessment 
731 Capitol Square Building 
St. Paul, Minnesota 55101 

MISSISSIPPI 

Jerry R. Hutchinson. Coor. 
Office of Planning & Evaluation 
Post Office Box 771 
Jackson, Mississippi 39205 



ERIC 



- 63 - 



MISSOURI 

John F. Allan. Director 
Planning & livuluation 
Missouri State Dept. of Hducjtion 
JetYerson City. Missouri (olOl 

MONTANA 

Mike Pichette 

Reporting Services Coordinator 
OlTice ol Supl. of Pub. Instruction 
Helena. Montana 5^)(^0\ 

NEBRASKA 

I rancis h. Colgan. Administrator 
Planning. iivaluaUon & Researcli 
State Department of I ducalion 
233 So. 1 0th Street 
Lincoln. Nebraska (^S508 

NEVADA 

James Kiley. Assoc. Supt. 
Division of Planning & Hvaluation 
Nevada State Dept. of Tducation 
Carson City. Nevada 89701 

NEW HAMPSHIRE 

R. Sclnveiker. Senior Consultant 
Research and Testing Services 
N. H. Department of Kducation 
Concord. New Hampshire 03301 

NEW JERSEY 

Bernard A. Kaplan. Director 
Our Schools Prog. Division of Re- 
search. Planning and Evaluation 
State Department of iiducation 
::4 W. State Street 
Trenton. New Jersey 06825 

NEW MEXICO 

A]an Morgan, State Director 
I:valuation and Assessment 
l:ducation Building 
Santa Te. New Mexico 87501 

NEW YORK 

L. Woollatt. Assoc. Commissioner 
N. Y. State Dept. of Iiducation 
Albany. New York i:::4 



NORTH CAROLINA 

William J. Brown. Director 
Division of Researcli 
N. C. Dept. of Public Instruction 
Raleigh. North Carolina 2761 1 

NORTH DAKOTA 

Lowell L. Jensen. Director 
Div. of Planning & Devel. 
State Dept. of Public Instruction 
Bismarck. North Dakota 58501 

OHIO 

Roger J. Lulow. Director 
Div. of Planning & Lvaluation 
Room 615. 65 So. I'ront Street 
Columbus. Ohio 43215 

OKLAHOMA 

James Casey. Coordinator 
Planning. Research. Evaluation 
State Department of Education 
State Capitol 

Oklahoma City.Okla. 73105 
OREGON 

R. B. Clemmer. Coordinator 
Planning & Evaluation 
942 Lancaster Drive NE 
Salem. Oregon 97310 

PENNSYLVANIA 

Thomas E. Kendig, Chief 
Division of Educational Quality 
Assessment 

Pennsylvania Dept. of Education 
P.O. Box 911 

Harrisburg, Pennsylvania 1 71 26 
PUERTO RICO 

Marta Barros LoubrieK Acting 
Dir.. Evaluation Program 

Urb. Tres. Monitas-Calle Teniente 
Cesar Gonzales Calaf. 

Hato Rey, Puerto Rico 0091 9 



ERLC 



-M - 



RHODE ISLAND 

Cynthia V, L. Ward 
Education Research Specialist 
Department of liducation 
Division of Research and 
Evaluation, Room 210 
Hayes Street 

Providence, Rhode Island 
SOUTH CAROLINA 

Dr. W. E. Ellis. Director 
Office of Research 
S. C. Department of Education 
Columbia, South Carolina 29201 

SOUTH DAKOTA 

Dr. Henry Kosters, Asst. Supt. 
Division of Elem. & Secon. Educ. 
Capitol Building 
Pierre, South Dakota 57501 

TENNESSEE 
John N. Hooker 
Director, Testing Sevrices 
V.T. at Knoxville 
1000 White Avenue 
ICnoxville, Tennessee 3791 6 

TEXAS 

Keith L. Cruse, Program Director 
Needs Assessment 
Texas Education Agency 
201 East 11th Street 
Austin, Texas 78701 

TRUST TERRITORY 

Thomas R. Brown 
Program & Research Officer 
Office of the Higli Commissioner 
Trust Territory of the 
P:idfic Islands 

Saipan, Marianna Islands 96950 
UTAH 

Stephen L. Murray 
Evaluation Specialist 
1400 University Club Building 
Salt Lake City, Utah 



VERMONT 

Dr. Herbert Tilley, Director 
Planning & Evaluation 
State Office Building 
Department of liducation 
Montpelier, Vermont 

VIRGINIA 

Dr. Charles C. Todd, Jr. 
Director of Planning 
State Department of Education 
Richmond, Virginia 23216 

VIRGIN ISLANDS 

Peter Rasmussen, Director 
Planning, Research & Evaluation 
Box 630, Dept. of Education 
Charlotte Amalic, St, Thomas 
Virgin Islands 0080 1 

WASHINGTON 

Alfred Rasp, Jr. 
Director, Program Evaluation 
Old Capitol Building 
Olympia, Wasliington 98504 

WEST VIRGINIA 
Dr, E.G. Pauley, Asst. Supt. 
Bureau of Services & Federal Prog. 
Department of Education 
Capitol Complex, Bldg. 6-B 
Charleston, West Virginia 25305 

WISCONSIN 

James H. Gold 

Division for Planning Services 
Dept. of Public Instruction 
1 26 Langdon Street 
Madison, Wisconsin 53702 

WYOMING 

Paul D. Sandifer, Asst. Supt. 

for Planning, Evaltution & 

Information Servkt^ 
State Department of Education 
Capitol Building 
Cheyenne, Wyoming 82001 



ERLC 



-65- 



APPENDIX il 



CONTRACTING AGENCIES USED BY STATE EDUCATION DEPARTMENTS* 
FOR MATTERS RELATING TO ASSESSMENT 

Alabama 

Needs Asscssmeni College ol LducatiDn. I'niveisity i)i 

Alabama 

Alaska 

Needs assessment Northwest Kcpofial 1: ducat iim 

Labi>ratury, Piirtlatid. Oregon 
Stanford Research Institute 
Brookings Institution 



Arizona 

Analysis of data tor reading 

acluevenient test 
Needs assessment 
Needs assessment 

Needs assessment 

Arkansas 
Needs assessment 

California 

Scoring services for statewide 
testing program 

PPBS 

Colorado 

Assessment of learner needs 
(edited assessment exercises^ 

Assessment of learner needs 
(computer programming consulting^ 

Assessmetit of learner needs (specifica- 
tions and compuier programs to 
analyze responses! 

Assessment of learner needs 
(exercises to assess affective learning) 

Connecticut 
Needs assessment 
and educational goals 



Southwest Research Associates 
Arizona State University 
EPIC Diversified Systems, Corp., 
Tucson, Arizona 

Consulting Services Corp., Seattle, 
Washington 

EPIC Diversified Systems. Corp.. 
Tucson, Arizona 

Cahfornia Test Burcau/McGraw Hill, 
Monterey, Calif. 

School Testing Service, Berkeley, Calif 
Peat, Marwick, Muchell & Co. 
(management consulting tirnU 

University of Colorado, Laboratory 
of Educational Research 
Pacific Educational Evaluation 
Systems(PEES> 

Auttmiated Data Processing Service 
Center, State 4)f Colorado 

Interstate Service Center 



Institute tor tlie Studv of Inquiring 
Systems (ISISL Philadelphia 



*rhi5 information was tiathcrvd under the sponsorship of the Cooperative Aeeountahiltty 
Project, Denver, Colo. Arthur Olson. Director. 



- 66 - 



OcLnvau: 

DtNtiict ot ( ohinibu 

Kcailing and niatltcnuncs testing 

IQonJa 

Suic'U lilt* jwC^MiKMii prog! am. 
itiNttunicniN and objL'ctivo 

Statewide aNNCsMiwni progMH). 

test Svi>Iin2 
CalalogiK* ot leading nhfcctives 
I llinoiN 

Needs Assessment 
Ucsts and so M nig devices I 

|i»wa 

Needs assessment 
( c 1 1 1 c r n » n - r e t e I c n c e d ni e a s u 1 0 s ) 

Kansas 

Needs assess men I 
PPBS 

Lvatuatiini <>! Viteational education 

Kentucky 
Needs assessment 
Ltunsiatia 
Needs assessineni 

Manie 

Statewide assessment 



Man land 

liistiiictional evaluation studies 
Goats 



Lducatinnal Testing Seivice, 
PiincetiMi, N.J, 

( alitotnia lest liuieau Mciiiaw llill 

( eiitcT loi the Siud> oi ! valuation 
(( SI:). I niveisitv oi ('ahloinia at 
Los Angeles 

Sot twaie Piogiamniing and Associ- 
ati s. Titusville. I loiida 
I iiMida Stale I uiveisity 

Science Keseaicli Associates. 
( liicago. Illinois 

InstiuctiiMial Ohiectives Lxehange. 
KLA 

Rescaich and (i rants Center, Kansas 
State Teachers Ci>llege, LinptMia 
Mid-C ontineni Regional liducatiiMu) 
Lahtuattn\ . Kansas (*it\ . Mi>. 
Teaching Research Divisnui. ()icgt>ti 
State System Iliglier l.ducation 

LPIC Diversified S\ stems C*i>ip. 

Reseaicli DivisiiMi. Ni»rtlnveslern 
State College ot Louisiana 

Research Ci>nsortium lor Ldueatanial 
Assessment I includes Research Tri^angle 
Instiuite, N.C.: Measurement Research 
Center, Iowa City. Iowa: and liie Am- 
erican Institute tor Research. Pah* 
Alto. Calilornia.l 

Instilule ot Administrative Research. 
Teacheis Ct»Ilegci,'t»lumbia L'nivcrsiiy 
Automation Industries. Inc.. Vitro 
La:bi>ratories Division 



ERIC 



-67 - 



Massachusetts 

Fourth grade testing program 



Michigan 

Michigan Educational Assessmicnt 
Program 

Minnesota 

Minnesota Educational Assessment 
Program 

Missouri 

Assessment ot 4th & 6th grade skills 
Statewide assessment program 

Montana 

Educational needs study 

Neva: ! 

Manag'rment information system 
Needs a^:<?ssment, affective objectives 
and measures 

New Jersey 
Goals 

Statewide vV^xessment 
New Mexico 

Needs assessment (does not look as 
though any contracting will be done 
for N,M/s 1973 assessment, however 

New York 

Accountability system for the N,Y.C. 

school systeim 
Performance Indicators in Education 

Program 

North Carolina 

State Assessment of Educational 
Progress 



Educational Testing Service, Prince- 
ton. N.J.:CBT/McGraw Hill: Project 
Comprehensive Achievement Moni- 
toring; Instructional Objectives Ex- 
change, UCLA; Wcstinghouse Learning 
and Measurement Research Center, 
its subsidiary. 

Educational Testing Service, 
Princeton, NJ. 

Research Triangle Institute, N.C.; 
Educational Management Services, 
Inc. University of Minnesota (for 
consulting) 

CTB/McGraw Hill 

Possibly Center for Education Assess- 
ment, Princeton, N.J. 

Arthur D. Little, Inc., Cambridge, Mass. 

Dahl/Kramer. Project Consultants 
Instructional Objectives Exchange, 
UCLA 

Opinion Research Corp., Princeton, 
NJ. 

Educational Testing Service, 
Princeton, NJ. 

EPIC Diversified Systems Corp., 
Tucson, Arizona 



Educational Testing Service, 
Princeton, NJ. 

University of the State of New York 



Research Triangle, N,C, 



ERLC 



-68- 



Oliio 

Development of educaliimal 
accountability model 

Oklahoma 
Goals 

Needs assessment 
Oregon 

Oregon Assessment of Educational 
Progress 

Pennsylvania 

Educational Quality Assessment 

Rhode Island 
PPBS 

South Dakota 
PPBS 
Tennessee 
Needs assessment 

Texas 

Needs assessment - 6th grade reading 
• criterion-referenced instrument - 
also for 6th grade mathematics 

Virginia 

Needs assessment 



Ohio State University's Evaluation 
Center; president of Ohio Council for 
Education employed on a contractual 
basis. 

College of Education, Oklahoma 
State University 

College of Education, Oklahoma 
State University 

Science Research Associates. 
Chicago. 111. 

Educational Testing Service, Princeton, 
N.J.; Pennsylvania State University 

Peat, Marwick, Mitchell & Co. (con- 
ducted seminar on PPBS sponsored 
by Department of Education) 

Applied Management Corporation 

Memphis State University, College 
of Education 



CTB/McGraw Hill 



Bureau of Educational Research, 
University of Virginia 



Washington 

Needs assessment Consulting Services Corporation, 

Seattle, Washington 

West Virginia 

Needs assessment - objectives. Human Resources Institute, West 

identifying variables Virginia University 



ERLC 



-69- 



Wvoiinni: 

Needs asscsMikMil .md ihmK 



C'lmiicv loi Kocarch. Slmvicc. and 
PiihlicaliiMi. C'ullcgc d! Lducalion. 
L'niveiMly ol Wyoming 



Otlit'i .icciicics and mslinilhuiN wliuh sl.itc hoards iiiiizlu w li» ^•i>ibull ow 
L'valiuilMMi niatUM.s inLludc: 

The AniLM lean College l esUng Pn»gi.nn. P.O. Bo\ )(>S. lovva C'ilv. Iowa 52240 
l*iunul.tlion lor Indi\ idnali/ed livalualion and l<e>eaieh. I)e Kalb. Illinois 
(leneul Lean n ng C'oi porat ion. New Voik. New \oik 

l:duealit)n ('onnni>sion ol (he Slates. Lnieoln Tower. Denver. Coloradt) S020.> 
(iiad> Kesearcli ANNtviales. 4(i()4 LI C'aniino Colorado SpiingN. Colo. <SOMS 
\alio!ijl Assessineni ni l:dnLalii)iial l*i»>gieNs. Denver. Coloiado 
AineiiL-an Ct>unLil on Iidncaiion. 1 Dn|v»nl CiiLle. WaNlungloii. I).C. 10i)M) 

.\'(f!c: Any lisiiHn lure Jih s nni imply \. \SIil. cnJtfrscfnin! ar fw tmuncihlatum. 
State h*a}\ls iirc aJviscJ tt* check <>n any nnisiiltifii: t^r^ianizatinu hcjorc ctm- 
tracts arc Ici. 



- 70 - 



